View MPC745_201332.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

MPC750UM/D 12/2001 Rev. 1
MPC750 RISC Microprocessor Family User's Manual
Devices Supported: MPC755 MPC750 MPC745 MPC740
HOW TO REACH US: USA/EUROPE/LOCATIONS NOT LISTED: Motorola Literature Distribution; P.O. Box 5405, Denver, Colorado 80217 1-303-675-2140 or 1-800-441-2447 JAPAN: Motorola Japan Ltd.; SPS, Technical Information Center, 3-20-1, Minami-Azabu Minato-ku, Tokyo 106-8573 Japan 81-3-3440-3569 ASIA/PACIFIC: Motorola Semiconductors H.K. Ltd.; Silicon Harbour Centre, 2 Dai King Street, Tai Po Industrial Estate, Tai Po, N.T., Hong Kong 852-26668334 TECHNICAL INFORMATION CENTER: 1-800-521-6274 HOME PAGE: http://www.motorola.com/semiconductors DOCUMENT COMMENTS: FAX (512) 933-2625, Attn: RISC Applications Engineering
Information in this document is provided solely to enable system and software implementers to use Motorola products. There are no express or implied copyright licenses granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information in this document. Motorola reserves the right to make changes without further notice to any products herein. Motorola makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Motorola assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. "Typical" parameters which may be provided in Motorola data sheets and/or specifications can and do vary in different applications and actual performance may vary over time. All operating parameters, including "Typicals" must be validated for each customer application by customer's technical experts. Motorola does not convey any license under its patent rights nor the rights of others. Motorola products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the Motorola product could create a situation where personal injury or death may occur. Should Buyer purchase or use Motorola products for any such unintended or unauthorized application, Buyer shall indemnify and hold Motorola and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Motorola was negligent regarding the design or manufacture of the part.
Motorola and the Stylized M Logo are registered in the U.S. Patent and Trademark Office. digital dna is a trademark of Motorola, Inc. All other product or service names are the property of their respective owners. Motorola, Inc. is an Equal Opportunity/Affirmative Action Employer. (c) Motorola, Inc. 2001
Overview Programming Model Cache Exceptions Memory Management Unit Instruction Timing Signals System Interface L2 Cache Interface Power Management Performance Monitor Instruction Set Listings Invalid Instructions MPC755 Microprocessor User's Manual Revision History
Index
1 2 3 4 5 6 7 8 9 10 11 A B C D IND
1 2 3 4 5 6 7 8 9 10 11 A B C D IND
Overview Programming Model Cache Exceptions Memory Management Unit Instruction Timing Signals System Interface L2 Cache Interface Power Management Performance Monitor Instruction Set Listings Invalid Instructions MPC755 Microprocessor User's Manual Revision History
Index
Contents
Paragraph Section Number Title Page Number
About This Book
Audience ........................................................................................................... xxxii Organization..................................................................................................... xxxiii Suggested Reading........................................................................................... xxxiv General Information ........................................................................... xxxiv Related Documentation ...................................................................... xxxiv Conventions .......................................................................................................xxxv Acronyms and Abbreviations........................................................................... xxxvi Terminology Conventions................................................................................ xxxix
Chapter 1 Overview
1.1 1.2 1.2.1 1.2.2 1.2.2.1 1.2.2.2 1.2.2.3 1.2.2.4 1.2.2.4.1 1.2.2.4.2 1.2.2.4.3 1.2.2.4.4 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.2.8 1.2.9 1.3 1.4 MPC750 Microprocessor Overview .................................................................... 1-1 MPC750 Microprocessor Features ...................................................................... 1-4 Overview of the MPC750 Microprocessor Features ....................................... 1-4 Instruction Flow ............................................................................................... 1-7 Instruction Queue and Dispatch Unit .......................................................... 1-8 Branch Processing Unit (BPU) .................................................................... 1-8 Completion Unit .......................................................................................... 1-9 Independent Execution Units..................................................................... 1-10 Integer Units (IUs)................................................................................. 1-10 Floating-Point Unit (FPU) ..................................................................... 1-10 Load/Store Unit (LSU) .......................................................................... 1-11 System Register Unit (SRU).................................................................. 1-11 Memory Management Units (MMUs)........................................................... 1-12 On-Chip Instruction and Data Caches ........................................................... 1-13 L2 Cache Implementation (Not Supported in the MPC740) ......................... 1-14 System Interface/Bus Interface Unit (BIU) ................................................... 1-15 Signals............................................................................................................ 1-16 Signal Configuration ...................................................................................... 1-17 Clocking......................................................................................................... 1-18 MPC750 Microprocessor Implementation......................................................... 1-19 PowerPC Registers and Programming Model ................................................... 1-21
MOTOROLA
Contents
v
Contents
Paragraph Number 1.5 1.5.1 1.5.2 1.6 1.6.1 1.6.2 1.7 1.7.1 1.7.2 1.8 1.8.1 1.8.2 1.9 1.10 1.11 1.12 Title Page Number
Instruction Set .................................................................................................... 1-25 PowerPC Instruction Set ................................................................................ 1-25 MPC750 Microprocessor Instruction Set ...................................................... 1-27 On-Chip Cache Implementation ........................................................................ 1-27 PowerPC Cache Model .................................................................................. 1-28 MPC750 Microprocessor Cache Implementation.......................................... 1-28 Exception Model................................................................................................ 1-28 PowerPC Exception Model............................................................................ 1-28 MPC750 Microprocessor Exception Implementation ................................... 1-30 Memory Management........................................................................................ 1-31 PowerPC Memory Management Model ........................................................ 1-32 MPC750 Microprocessor Memory Management Implementation................ 1-32 Instruction Timing.............................................................................................. 1-33 Power Management ........................................................................................... 1-35 Thermal Management ........................................................................................ 1-36 Performance Monitor ......................................................................................... 1-37
Chapter 2 Programming Model
2.1 2.1.1 2.1.2 2.1.2.1 2.1.2.2 2.1.2.3 2.1.2.4 2.1.2.4.1 2.1.2.4.2 2.1.2.4.3 2.1.2.4.4 2.1.2.4.5 2.1.2.4.6 2.1.2.4.7 2.1.2.4.8 2.1.2.4.9 2.1.3 2.1.4 2.1.5 2.1.6 The MPC750 Processor Register Set................................................................... 2-1 Register Set ...................................................................................................... 2-2 MPC750-Specific Registers ............................................................................. 2-9 Instruction Address Breakpoint Register (IABR)........................................ 2-9 Hardware Implementation-Dependent Register 0 ..................................... 2-10 Hardware Implementation-Dependent Register 1 ..................................... 2-14 Performance Monitor Registers ................................................................. 2-14 Monitor Mode Control Register 0 (MMCR0) ....................................... 2-15 User Monitor Mode Control Register 0 (UMMCR0)............................ 2-16 Monitor Mode Control Register 1 (MMCR1) ....................................... 2-17 User Monitor Mode Control Register 1 (UMMCR1)............................ 2-17 Performance Monitor Counter Registers (PMC1-PMC4) .................... 2-17 User Performance Monitor Counter Registers (UPMC1-UPMC4) ...... 2-21 Sampled Instruction Address Register (SIA)......................................... 2-21 User Sampled Instruction Address Register (USIA) ............................. 2-21 Sampled Data Address Register (SDA) and User Sampled Data Address Register (USDA) ............................................................. 2-21 Instruction Cache Throttling Control Register (ICTC).................................. 2-22 Thermal Management Registers (THRM1-THRM3) ................................... 2-22 L2 Cache Control Register (L2CR) ............................................................... 2-25 Reset Settings................................................................................................. 2-27
vi
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Contents
Paragraph Number 2.2 2.2.1 2.2.2 2.2.3 2.2.4 2.3 2.3.1 2.3.1.1 2.3.1.2 2.3.1.3 2.3.1.4 2.3.2 2.3.2.1 2.3.2.2 2.3.2.3 2.3.2.4 2.3.2.4.1 2.3.2.4.2 2.3.2.4.3 2.3.3 2.3.4 2.3.4.1 2.3.4.1.1 2.3.4.1.2 2.3.4.1.3 2.3.4.1.4 2.3.4.2 2.3.4.2.1 2.3.4.2.2 2.3.4.2.3 2.3.4.2.4 2.3.4.2.5 2.3.4.2.6 2.3.4.3 2.3.4.3.1 2.3.4.3.2 2.3.4.3.3 2.3.4.3.4 2.3.4.3.5 2.3.4.3.6 2.3.4.3.7 Title Page Number
Operand Conventions......................................................................................... 2-28 Floating-Point Execution Models--UISA..................................................... 2-28 Data Organization in Memory and Data Transfers ........................................ 2-29 Alignment and Misaligned Accesses ............................................................. 2-29 Floating-Point Operand.................................................................................. 2-30 Instruction Set Summary.................................................................................... 2-31 Classes of Instructions ................................................................................... 2-33 Definition of Boundedly Undefined........................................................... 2-33 Defined Instruction Class........................................................................... 2-33 Illegal Instruction Class ............................................................................. 2-34 Reserved Instruction Class......................................................................... 2-35 Addressing Modes ......................................................................................... 2-35 Memory Addressing................................................................................... 2-35 Memory Operands ..................................................................................... 2-35 Effective Address Calculation.................................................................... 2-36 Synchronization ......................................................................................... 2-36 Context Synchronization ....................................................................... 2-36 Execution Synchronization.................................................................... 2-37 Instruction-Related Exceptions.............................................................. 2-37 Instruction Set Overview ............................................................................... 2-38 PowerPC UISA Instructions .......................................................................... 2-38 Integer Instructions .................................................................................... 2-38 Integer Arithmetic Instructions .............................................................. 2-38 Integer Compare Instructions ................................................................ 2-40 Integer Logical Instructions................................................................... 2-40 Integer Rotate and Shift Instructions ..................................................... 2-41 Floating-Point Instructions ........................................................................ 2-42 Floating-Point Arithmetic Instructions .................................................. 2-42 Floating-Point Multiply-Add Instructions ............................................. 2-43 Floating-Point Rounding and Conversion Instructions ......................... 2-43 Floating-Point Compare Instructions..................................................... 2-44 Floating-Point Status and Control Register Instructions ....................... 2-44 Floating-Point Move Instructions .......................................................... 2-45 Load and Store Instructions ....................................................................... 2-45 Self-Modifying Code ............................................................................. 2-46 Integer Load and Store Address Generation.......................................... 2-46 Register Indirect Integer Load Instructions ........................................... 2-46 Integer Store Instructions....................................................................... 2-48 Integer Store Gathering.......................................................................... 2-49 Integer Load and Store with Byte-Reverse Instructions ........................ 2-49 Integer Load and Store Multiple Instructions........................................ 2-49
MOTOROLA
Contents
vii
Contents
Paragraph Number 2.3.4.3.8 2.3.4.3.9 2.3.4.3.10 2.3.4.4 2.3.4.4.1 2.3.4.4.2 2.3.4.4.3 2.3.4.4.4 2.3.4.5 2.3.4.6 2.3.4.6.1 2.3.4.6.2 2.3.4.7 2.3.5 2.3.5.1 2.3.5.2 2.3.5.3 2.3.5.3.1 2.3.5.4 2.3.6 2.3.6.1 2.3.6.2 2.3.6.3 2.3.6.3.1 2.3.6.3.2 2.3.6.3.3 2.3.7 Title Page Number
Integer Load and Store String Instructions............................................ 2-50 Floating-Point Load and Store Address Generation .............................. 2-51 Floating-Point Store Instructions........................................................... 2-52 Branch and Flow Control Instructions....................................................... 2-54 Branch Instruction Address Calculation ................................................ 2-54 Branch Instructions................................................................................ 2-54 Condition Register Logical Instructions................................................ 2-55 Trap Instructions .................................................................................... 2-55 System Linkage Instruction--UISA.......................................................... 2-56 Processor Control Instructions--UISA ..................................................... 2-56 Move to/from Condition Register Instructions...................................... 2-56 Move to/from Special-Purpose Register Instructions (UISA) ............... 2-56 Memory Synchronization Instructions--UISA ......................................... 2-59 PowerPC VEA Instructions............................................................................ 2-60 Processor Control Instructions--VEA ...................................................... 2-60 Memory Synchronization Instructions--VEA .......................................... 2-61 Memory Control Instructions--VEA ........................................................ 2-62 User-Level Cache Instructions--VEA .................................................. 2-62 Optional External Control Instructions...................................................... 2-64 PowerPC OEA Instructions ........................................................................... 2-65 System Linkage Instructions--OEA ......................................................... 2-65 Processor Control Instructions--OEA ...................................................... 2-65 Memory Control Instructions--OEA ........................................................ 2-66 Supervisor-Level Cache Management Instruction--(OEA).................. 2-66 Segment Register Manipulation Instructions (OEA)............................. 2-67 Translation Lookaside Buffer Management Instructions--(OEA)........ 2-67 Recommended Simplified Mnemonics.......................................................... 2-68
Chapter 3 L1 Instruction and Data Cache Operation
3.1 3.2 3.3 3.3.1 3.3.2 3.3.2.1 3.3.3 3.3.4 3.3.5 3.3.5.1 Data Cache Organization ..................................................................................... 3-3 Instruction Cache Organization ........................................................................... 3-4 Memory and Cache Coherency............................................................................ 3-5 Memory/Cache Access Attributes (WIMG Bits) ............................................. 3-6 MEI Protocol.................................................................................................... 3-7 MEI Hardware Considerations .................................................................... 3-9 Coherency Precautions in Single Processor Systems .................................... 3-10 Coherency Precautions in Multiprocessor Systems....................................... 3-10 MPC750-Initiated Load/Store Operations ..................................................... 3-10 Performed Loads and Stores ...................................................................... 3-11
viii
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Contents
Paragraph Number 3.3.5.2 3.3.5.3 3.4 3.4.1 3.4.1.1 3.4.1.2 3.4.1.3 3.4.1.4 3.4.1.5 3.4.1.6 3.4.2 3.4.2.1 3.4.2.2 3.4.2.3 3.4.2.4 3.4.2.5 3.4.2.6 3.5 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5 3.5.5.1 3.6 3.6.1 3.6.2 3.6.3 3.6.4 3.6.5 3.7 3.8 Title Page Number
Sequential Consistency of Memory Accesses ........................................... 3-11 Atomic Memory References...................................................................... 3-11 Cache Control .................................................................................................... 3-13 Cache Control Parameters in HID0 ............................................................... 3-13 Data Cache Flash Invalidation ................................................................... 3-13 Data Cache Enabling/Disabling................................................................. 3-13 Data Cache Locking .................................................................................. 3-14 Instruction Cache Flash Invalidation ......................................................... 3-14 Instruction Cache Enabling/Disabling ....................................................... 3-14 Instruction Cache Locking......................................................................... 3-15 Cache Control Instructions ............................................................................ 3-15 Data Cache Block Touch (dcbt) and Data Cache Block Touch for Store (dcbtst) ........................................... 3-15 Data Cache Block Zero (dcbz) .................................................................. 3-16 Data Cache Block Store (dcbst) ................................................................ 3-17 Data Cache Block Flush (dcbf) ................................................................. 3-17 Data Cache Block Invalidate (dcbi)........................................................... 3-17 Instruction Cache Block Invalidate (icbi).................................................. 3-18 Cache Operations ............................................................................................... 3-18 Cache Block Replacement/Castout Operations ............................................. 3-18 Cache Flush Operations ................................................................................. 3-21 Data Cache-Block-Fill Operations................................................................. 3-21 Instruction Cache-Block-Fill Operations....................................................... 3-21 Data Cache-Block-Push Operation................................................................ 3-22 Enveloped High-Priority Cache-Block-Push Operation ............................ 3-22 L1 Caches and 60x Bus Transactions ................................................................ 3-22 Read Operations and the MEI Protocol ......................................................... 3-23 Bus Operations Caused by Cache Control Instructions................................. 3-23 Snooping ........................................................................................................ 3-25 Snoop Response to 60x Bus Transactions ..................................................... 3-26 Transfer Attributes ......................................................................................... 3-28 Bus Interface ...................................................................................................... 3-30 MEI State Transactions ...................................................................................... 3-31
Chapter 4 Exceptions
4.1 4.2 4.3 4.3.1 MPC750 Microprocessor Exceptions .................................................................. 4-2 Exception Recognition and Priorities .................................................................. 4-4 Exception Processing ........................................................................................... 4-7 Enabling and Disabling Exceptions............................................................... 4-10
MOTOROLA
Contents
ix
Contents
Paragraph Number 4.3.2 4.3.3 4.3.4 4.4 4.5 4.5.1 4.5.2 4.5.2.1 4.5.2.2 4.5.3 4.5.4 4.5.5 4.5.6 4.5.7 4.5.8 4.5.9 4.5.10 4.5.11 4.5.12 4.5.13 4.5.14 4.5.15 4.5.16 Title Page Number
Steps for Exception Processing...................................................................... 4-10 Setting MSR[RI] ............................................................................................ 4-11 Returning from an Exception Handler........................................................... 4-11 Process Switching .............................................................................................. 4-12 Exception Definitions......................................................................................... 4-12 System Reset Exception (0x00100)............................................................... 4-13 Machine Check Exception (0x00200) ........................................................... 4-14 Machine Check Exception Enabled (MSR[ME] = 1)................................ 4-16 Checkstop State (MSR[ME] = 0) .............................................................. 4-16 DSI Exception (0x00300) .............................................................................. 4-17 ISI Exception (0x00400)................................................................................ 4-17 External Interrupt Exception (0x00500) ........................................................ 4-17 Alignment Exception (0x00600) ................................................................... 4-18 Program Exception (0x00700)....................................................................... 4-18 Floating-Point Unavailable Exception (0x00800) ......................................... 4-19 Decrementer Exception (0x00900)................................................................ 4-19 System Call Exception (0x00C00) ................................................................ 4-19 Trace Exception (0x00D00)........................................................................... 4-19 Floating-Point Assist Exception (0x00E00) .................................................. 4-20 Performance Monitor Interrupt (0x00F00) .................................................... 4-20 Instruction Address Breakpoint Exception (0x01300)................................... 4-21 System Management Interrupt (0x01400) ..................................................... 4-22 Thermal Management Interrupt Exception (0x01700) .................................. 4-23
Chapter 5 Memory Management
5.1 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.1.6.1 5.1.6.2 5.1.7 5.1.8 5.2 5.3 MMU Overview................................................................................................... 5-2 Memory Addressing......................................................................................... 5-3 MMU Organization.......................................................................................... 5-4 Address Translation Mechanisms .................................................................... 5-7 Memory Protection Facilities........................................................................... 5-9 Page History Information............................................................................... 5-10 General Flow of MMU Address Translation ................................................. 5-11 Real Addressing Mode and Block Address Translation Selection ............................................................................. 5-11 Page Address Translation Selection........................................................... 5-12 MMU Exceptions Summary .......................................................................... 5-14 MMU Instructions and Register Summary .................................................... 5-16 Real Addressing Mode....................................................................................... 5-18 Block Address Translation ................................................................................. 5-18
x
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Contents
Paragraph Number 5.4 5.4.1 5.4.1.1 5.4.1.2 5.4.1.3 5.4.2 5.4.3 5.4.3.1 5.4.3.2 5.4.4 5.4.5 5.4.6 5.4.7 Title Page Number
Memory Segment Model ................................................................................... 5-19 Page History Recording ................................................................................. 5-19 Referenced Bit ........................................................................................... 5-20 Changed Bit ............................................................................................... 5-21 Scenarios for Referenced and Changed Bit Recording ............................. 5-21 Page Memory Protection ............................................................................... 5-23 TLB Description ............................................................................................ 5-23 TLB Organization ...................................................................................... 5-23 TLB Invalidation........................................................................................ 5-25 Page Address Translation Summary .............................................................. 5-26 Page Table Search Operation ......................................................................... 5-27 Page Table Updates........................................................................................ 5-31 Segment Register Updates ............................................................................. 5-32
Chapter 6 Instruction Timing
6.1 6.2 6.3 6.3.1 6.3.2 6.3.2.1 6.3.2.2 6.3.2.3 6.3.2.4 6.3.3 6.3.3.1 6.3.3.2 6.4 6.4.1 6.4.1.1 6.4.1.2 6.4.1.3 6.4.1.3.1 6.4.1.3.2 6.4.2 6.4.3 6.4.4 6.4.5 6.4.6 Terminology and Conventions ............................................................................. 6-1 Instruction Timing Overview ............................................................................... 6-3 Timing Considerations......................................................................................... 6-7 General Instruction Flow ................................................................................. 6-8 Instruction Fetch Timing................................................................................ 6-10 Cache Arbitration....................................................................................... 6-11 Cache Hit ................................................................................................... 6-11 Cache Miss................................................................................................. 6-14 L2 Cache Access Timing Considerations ................................................. 6-15 Instruction Dispatch and Completion Considerations ................................... 6-16 Rename Register Operation ....................................................................... 6-17 Instruction Serialization............................................................................. 6-17 Execution Unit Timings ..................................................................................... 6-18 Branch Processing Unit Execution Timing.................................................... 6-18 Branch Folding and Removal of Fall-Through Branch Instructions ......... 6-18 Branch Instructions and Completion ......................................................... 6-20 Branch Prediction and Resolution ............................................................. 6-21 Static Branch Prediction ........................................................................ 6-22 Predicted Branch Timing Examples ...................................................... 6-22 Integer Unit Execution Timing ...................................................................... 6-24 Floating-Point Unit Execution Timing .......................................................... 6-24 Effect of Floating-Point Exceptions on Performance .................................... 6-25 Load/Store Unit Execution Timing................................................................ 6-25 Effect of Operand Placement on Performance............................................... 6-25
MOTOROLA
Contents
xi
Contents
Paragraph Number 6.4.7 6.4.8 6.5 6.5.1 6.5.2 6.6 6.6.1 6.6.1.1 6.6.1.2 6.6.1.3 6.7 Title Page Number
Integer Store Gathering.................................................................................. 6-26 System Register Unit Execution Timing........................................................ 6-27 Memory Performance Considerations ............................................................... 6-27 Caching and Memory Coherency .................................................................. 6-27 Effect of TLB Miss ........................................................................................ 6-28 Instruction Scheduling Guidelines..................................................................... 6-28 Branch, Dispatch, and Completion Unit Resource Requirements................. 6-29 Branch Resolution Resource Requirements .............................................. 6-29 Dispatch Unit Resource Requirements ...................................................... 6-30 Completion Unit Resource Requirements ................................................. 6-30 Instruction Latency Summary............................................................................ 6-31
Chapter 7 Signal Descriptions
7.1 7.2 7.2.1 7.2.1.1 7.2.1.2 7.2.1.3 7.2.1.3.1 7.2.1.3.2 7.2.2 7.2.2.1 7.2.2.1.1 7.2.2.1.2 7.2.3 7.2.3.1 7.2.3.1.1 7.2.3.1.2 7.2.3.2 7.2.3.2.1 7.2.3.2.2 7.2.4 7.2.4.1 7.2.4.1.1 7.2.4.1.2 7.2.4.2 7.2.4.3 7.2.4.3.1 Signal Configuration ............................................................................................ 7-2 Signal Descriptions .............................................................................................. 7-3 Address Bus Arbitration Signals...................................................................... 7-4 Bus Request (BR)--Output ......................................................................... 7-4 Bus Grant (BG)--Input ............................................................................... 7-4 Address Bus Busy (ABB)............................................................................ 7-5 Address Bus Busy (ABB)--Output......................................................... 7-5 Address Bus Busy (ABB)--Input ........................................................... 7-5 Address Transfer Start Signals......................................................................... 7-6 Transfer Start (TS) ....................................................................................... 7-6 Transfer Start (TS)--Output.................................................................... 7-6 Transfer Start (TS)--Input....................................................................... 7-6 Address Transfer Signals ................................................................................. 7-7 Address Bus (A[0-31])................................................................................ 7-7 Address Bus (A[0-31])--Output............................................................. 7-7 Address Bus (A[0-31])--Input ............................................................... 7-7 Address Bus Parity (AP[0-3]) ..................................................................... 7-7 Address Bus Parity (AP[0-3])--Output.................................................. 7-8 Address Bus Parity (AP[0-3])--Input..................................................... 7-8 Address Transfer Attribute Signals .................................................................. 7-8 Transfer Type (TT[0-4]).............................................................................. 7-8 Transfer Type (TT[0-4])--Output........................................................... 7-9 Transfer Type (TT[0-4])--Input ............................................................. 7-9 Transfer Size (TSIZ[0-2])--Output .......................................................... 7-11 Transfer Burst (TBST)............................................................................... 7-12 Transfer Burst (TBST)--Output............................................................ 7-12
xii
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Contents
Paragraph Number 7.2.4.3.2 7.2.4.4 7.2.4.5 7.2.4.6 7.2.4.6.1 7.2.4.6.2 7.2.5 7.2.5.1 7.2.5.2 7.2.5.2.1 7.2.5.2.2 7.2.6 7.2.6.1 7.2.6.2 7.2.6.3 7.2.6.3.1 7.2.6.3.2 7.2.7 7.2.7.1 7.2.7.1.1 7.2.7.1.2 7.2.7.2 7.2.7.2.1 7.2.7.2.2 7.2.7.3 7.2.8 7.2.8.1 7.2.8.2 7.2.8.3 7.2.9 7.2.9.1 7.2.9.2 7.2.9.3 7.2.9.4 7.2.9.5 7.2.9.6 7.2.9.6.1 7.2.9.6.2 7.2.9.7 7.2.9.7.1 7.2.9.7.2 Title Page Number
Transfer Burst (TBST)--Input .............................................................. 7-12 Cache Inhibit (CI)--Output ....................................................................... 7-13 Write-Through (WT)--Output .................................................................. 7-13 Global (GBL)............................................................................................. 7-13 Global (GBL)--Output.......................................................................... 7-13 Global (GBL)--Input ............................................................................ 7-14 Address Transfer Termination Signals........................................................... 7-14 Address Acknowledge (AACK)--Input.................................................... 7-14 Address Retry (ARTRY) ........................................................................... 7-14 Address Retry (ARTRY)--Output ........................................................ 7-14 Address Retry (ARTRY)--Input ........................................................... 7-15 Data Bus Arbitration Signals ......................................................................... 7-16 Data Bus Grant (DBG)--Input .................................................................. 7-16 Data Bus Write Only (DBWO)--Input ..................................................... 7-16 Data Bus Busy (DBB) ............................................................................... 7-17 Data Bus Busy (DBB)--Output ............................................................ 7-17 Data Bus Busy (DBB)--Input ............................................................... 7-17 Data Transfer Signals..................................................................................... 7-17 Data Bus (DH[0-31], DL[0-31]) .............................................................. 7-18 Data Bus (DH[0-31], DL[0-31])--Output ........................................... 7-18 Data Bus (DH[0-31], DL[0-31])--Input .............................................. 7-18 Data Bus Parity (DP[0-7])......................................................................... 7-19 Data Bus Parity (DP[0-7])--Output ..................................................... 7-19 Data Bus Parity (DP[0-7])--Input ........................................................ 7-19 Data Bus Disable (DBDIS)--Input ........................................................... 7-19 Data Transfer Termination Signals ................................................................ 7-20 Transfer Acknowledge (TA)--Input.......................................................... 7-20 Data Retry (DRTRY)--Input..................................................................... 7-21 Transfer Error Acknowledge (TEA)--Input.............................................. 7-21 System Status Signals .................................................................................... 7-22 Interrupt (INT)--Input............................................................................... 7-22 System Management Interrupt (SMI)--Input ........................................... 7-22 Machine Check Interrupt (MCP)--Input................................................... 7-22 Checkstop Input (CKSTP_IN)--Input ...................................................... 7-23 Checkstop Output (CKSTP_OUT)--Output............................................. 7-23 Reset Signals.............................................................................................. 7-24 Hard Reset (HRESET)--Input .............................................................. 7-24 Soft Reset (SRESET)--Input ................................................................ 7-24 Processor Status Signals ............................................................................ 7-25 Quiescent Request (QREQ)--Output.................................................... 7-25 Quiescent Acknowledge (QACK)--Input ............................................. 7-25
MOTOROLA
Contents
xiii
Contents
Paragraph Number 7.2.9.7.3 7.2.9.7.4 7.2.9.7.5 7.2.9.7.6 7.2.9.8 7.2.9.9 7.2.9.9.1 7.2.9.9.2 7.2.9.10 7.2.9.10.1 7.2.9.10.2 7.2.9.11 7.2.9.12 7.2.9.13 7.2.9.14 7.2.9.15 7.2.9.16 7.2.9.17 7.2.10 7.2.11 7.2.11.1 7.2.11.2 7.2.11.3 7.2.12 Title Page Number
Reservation (RSRV)--Output ............................................................... 7-25 Time Base Enable (TBEN)--Input........................................................ 7-26 TLBI Sync (TLBISYNC)--Input.......................................................... 7-26 L2 Cache Interface................................................................................. 7-26 L2 Address (L2ADDR[16-0])--Output.................................................... 7-26 L2 Data (L2DATA[0-63]) ......................................................................... 7-27 L2 Data (L2DATA[0-63])--Output ...................................................... 7-27 L2 Data (L2DATA[0-63])--Input ......................................................... 7-27 L2 Data Parity (L2DP[0-7]) ...................................................................... 7-27 L2 Data Parity (L2DP[0-7])--Output................................................... 7-27 L2 Data Parity (L2DP[0-7])--Input ..................................................... 7-27 L2 Chip Enable (L2CE)--Output.............................................................. 7-28 L2 Write Enable (L2WE)--Output ........................................................... 7-28 L2 Clock Out A (L2CLK_OUTA)--Output.............................................. 7-28 L2 Clock Out B (L2CLK_OUTB)--Output.............................................. 7-28 L2 Sync Out (L2SYNC_OUT)--Output................................................... 7-29 L2 Sync In (L2SYNC_IN)--Input ............................................................ 7-29 L2 Low-Power Mode Enable (L2ZZ)--Output......................................... 7-29 IEEE 1149.1a-1993 Interface Description..................................................... 7-30 Clock Signals ................................................................................................. 7-30 System Clock (SYSCLK)--Input.............................................................. 7-30 Clock Out (CLK_OUT)--Output.............................................................. 7-31 PLL Configuration (PLL_CFG[0-3])--Input ........................................... 7-31 Power and Ground Signals............................................................................. 7-31
Chapter 8 System Interface Operation
8.1 8.1.1 8.1.2 8.1.3 8.1.4 8.2 8.2.1 8.2.2 8.3 8.3.1 8.3.2 8.3.2.1 8.3.2.2 MPC750 System Interface Overview .................................................................. 8-1 Operation of the Instruction and Data L1 Caches............................................ 8-2 Operation of the L2 Cache............................................................................... 8-4 Operation of the System Interface ................................................................... 8-4 Direct-Store Accesses ...................................................................................... 8-6 Memory Access Protocol ..................................................................................... 8-6 Arbitration Signals........................................................................................... 8-7 Address Pipelining and Split-Bus Transactions............................................... 8-8 Address Bus Tenure ............................................................................................. 8-9 Address Bus Arbitration................................................................................... 8-9 Address Transfer ............................................................................................ 8-12 Address Bus Parity..................................................................................... 8-13 Address Transfer Attribute Signals ............................................................ 8-13
xiv
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Contents
Paragraph Number 8.3.2.2.1 8.3.2.2.2 8.3.2.2.3 8.3.2.2.4 8.3.2.3 8.3.2.4 8.3.2.4.1 8.3.3 8.4 8.4.1 8.4.1.1 8.4.2 8.4.3 8.4.4 8.4.4.1 8.4.4.2 8.4.4.3 8.4.5 8.5 8.6 8.7 8.7.1 8.7.2 8.7.3 8.7.4 8.8 8.8.1 8.8.2 8.9 8.9.1 8.10 Title Page Number
Transfer Type (TT[0-4]) Signals ........................................................... 8-13 Transfer Size (TSIZ[0-2]) Signals ........................................................ 8-14 Write-Through (WT) Signal .................................................................. 8-14 Cache Inhibit (CI) Signal....................................................................... 8-15 Burst Ordering During Data Transfers ...................................................... 8-15 Effect of Alignment in Data Transfers....................................................... 8-15 Alignment of External Control Instructions .......................................... 8-17 Address Transfer Termination........................................................................ 8-17 Data Bus Tenure................................................................................................. 8-19 Data Bus Arbitration ...................................................................................... 8-19 Using the DBB Signal ............................................................................... 8-20 Data Bus Write Only...................................................................................... 8-21 Data Transfer.................................................................................................. 8-21 Data Transfer Termination ............................................................................. 8-22 Normal Single-Beat Termination............................................................... 8-22 Normal Burst Termination ......................................................................... 8-24 Data Transfer Termination Due to a Bus Error.......................................... 8-26 Memory Coherency--MEI Protocol ............................................................. 8-26 Timing Examples ............................................................................................... 8-28 No-DRTRY Mode ............................................................................................. 8-33 Interrupt, Checkstop, and Reset Signal Operation............................................. 8-34 External Interrupts ......................................................................................... 8-34 Checkstops ..................................................................................................... 8-35 Reset Inputs.................................................................................................... 8-35 System Quiesce Control Signals.................................................................... 8-35 Processor State Signals ...................................................................................... 8-35 Support for the lwarx/stwcx. Instruction Pair ............................................... 8-36 TLBISYNC Input .......................................................................................... 8-36 IEEE 1149.1a-1993 Compliant Interface........................................................... 8-36 JTAG/COP Interface ...................................................................................... 8-36 Using Data Bus Write Only ............................................................................... 8-37
Chapter 9 L2 Cache Interface Operation
9.1 9.1.1 9.1.2 9.1.3 9.1.4 9.1.5 L2 Cache Interface Overview .............................................................................. 9-1 L2 Cache Operation ......................................................................................... 9-2 L2 Cache Flushing ........................................................................................... 9-4 L2 Cache Control Register (L2CR) ................................................................. 9-4 L2 Cache Initialization..................................................................................... 9-6 L2 Cache Global Invalidation .......................................................................... 9-7
MOTOROLA
Contents
xv
Contents
Paragraph Number 9.1.6 9.1.6.1 9.1.6.2 9.1.7 9.1.8 9.1.8.1 9.1.8.2 9.1.8.3 Title Page Number
L2 Cache Test Features and Methods .............................................................. 9-8 L2CR Support for L2 Cache Testing ........................................................... 9-8 L2 Cache Testing ......................................................................................... 9-9 L2 Clock Configuration ................................................................................. 9-10 L2 Cache SRAM Timing Examples .............................................................. 9-10 Flow-Through Burst SRAM ...................................................................... 9-10 Pipelined Burst SRAM .............................................................................. 9-12 Late-Write SRAM ..................................................................................... 9-13
Chapter 10 Power and Thermal Management
10.1 10.2 10.2.1 10.2.1.1 10.2.1.2 10.2.1.3 10.2.1.4 10.2.1.5 10.2.2 10.3 10.3.1 10.3.2 10.3.2.1 10.3.2.2 10.3.2.3 10.3.2.4 10.4 Dynamic Power Management............................................................................ 10-1 Programmable Power Modes ............................................................................. 10-1 Power Management Modes............................................................................ 10-2 Full-Power Mode with DPM Disabled ...................................................... 10-2 Full-Power Mode with DPM Enabled ....................................................... 10-2 Doze Mode................................................................................................. 10-3 Nap Mode .................................................................................................. 10-3 Sleep Mode ................................................................................................ 10-4 Power Management Software Considerations ............................................... 10-5 Thermal Assist Unit ........................................................................................... 10-5 Thermal Assist Unit Overview....................................................................... 10-6 Thermal Assist Unit Operation ...................................................................... 10-7 TAU Single Threshold Mode ..................................................................... 10-8 TAU Dual-Threshold Mode ....................................................................... 10-9 MPC750 Junction Temperature Determination ......................................... 10-9 Power Saving Modes and TAU Operation ............................................... 10-10 Instruction Cache Throttling ............................................................................ 10-10
Chapter 11 Performance Monitor
11.1 11.2 11.2.1 11.2.1.1 11.2.1.2 11.2.1.3 11.2.1.4 11.2.1.5
xvi
Performance Monitor Interrupt .......................................................................... 11-2 Special-Purpose Registers Used by Performance Monitor ................................ 11-2 Performance Monitor Registers ..................................................................... 11-3 Monitor Mode Control Register 0 (MMCR0) ........................................... 11-3 User Monitor Mode Control Register 0 (UMMCR0)................................ 11-5 Monitor Mode Control Register 1 (MMCR1) ........................................... 11-5 User Monitor Mode Control Register 1 (UMMCR1)................................ 11-6 Performance Monitor Counter Registers (PMC1-PMC4)......................... 11-6
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Contents
Paragraph Number 11.2.1.6 11.2.1.7 11.2.1.8 11.3 11.4 11.5 Title Page Number
User Performance Monitor Counter Registers (UPMC1-UPMC4) .......... 11-9 Sampled Instruction Address Register (SIA)........................................... 11-10 User Sampled Instruction Address Register (USIA) ............................... 11-10 Event Counting ................................................................................................ 11-10 Event Selection ................................................................................................ 11-11 Warnings .......................................................................................................... 11-12
Appendix A PowerPC Instruction Set Listings
A.1 A.2 A.3 A.4 A.5 Instructions Sorted by Mnemonic....................................................................... A-1 Instructions Sorted by Opcode............................................................................ A-7 Instructions Grouped by Functional Categories ............................................... A-13 Instructions Sorted by Form.............................................................................. A-22 Instruction Set Legend ...................................................................................... A-31
Appendix B Instructions Not Implemented Appendix C MPC755 Embedded G3 Microprocessor
C.1 C.2 C.3 C.4 C.4.1 MPC755 Overview ..............................................................................................C-2 MPC755 Functional Description .........................................................................C-3 MPC755 Features.................................................................................................C-6 The MPC755 Programming Model (Chapter 2) ................................................C-10 MPC755-Specific Registers...........................................................................C-12
C.4.1.1 C.4.1.2 C.4.1.3
C.4.2
The MPC755 Additional SPR Encodings........................................C-13 Processor Version Register (PVR)...................................................C-14 Hardware Implementation-Dependent Register 2 (HID2)...............C-15
MPC750 and MPC755 Instruction Use .........................................................C-16
C.4.2.1 C.4.2.2
C.4.3 C.5 C.5.1
stfd Instruction Use..........................................................................C-16 isync Instruction Use with mtsr and mtsrin .....................................C-16
tlbld and tlbli Instructions.............................................................................C-17 MPC755 L1 Instruction and Data Cache Operation (Chapter 3).......................C-19 L1 Cache Coherency......................................................................................C-20
C.5.1.1 C.5.1.2
C.5.2
Coherency Precautions in Single Processor Systems ......................C-20 dcbz and L1 Cache Coherency ........................................................C-21
Cache Locking ...............................................................................................C-21
C.5.2.1
MOTOROLA
Cache Locking Terminology ...........................................................C-21
Contents xvii
Contents
Paragraph Number Title Page Number
C.5.2.2 C.5.2.3 C.5.2.3.1 C.5.2.3.2 C.5.2.3.3 C.5.2.3.4 C.5.2.3.5 C.5.2.3.6 C.5.2.3.7 C.5.2.3.8 C.5.2.3.9 C.5.2.3.10 C.5.2.3.11 C.5.2.3.12 C.5.2.3.13 C.5.2.3.14 C.5.2.3.15 C.5.2.3.16
C.6 C.6.1 C.6.2 C.6.3 C.7 C.7.1 C.7.2
Cache Locking Register Summary ..................................................C-22 Performing Data and Instruction Cache Locking ............................C-23 Enabling the Data Cache ..............................................................C-23 Address Translation for Data Cache Locking ..............................C-24 Disabling Exceptions for Data Cache Locking ............................C-24 Invalidating the Data Cache .........................................................C-25 Loading the Data Cache ...............................................................C-26 Entire Data Cache Locking ..........................................................C-26 Data Cache Way Locking ............................................................C-27 Invalidating the Data Cache (Even if Locked).............................C-27 Enabling the Instruction Cache ....................................................C-28 Address Translation for Instruction Cache Locking ....................C-28 Disabling Exceptions for Instruction Cache Locking ..................C-29 Preloading Instructions into the Instruction Cache ......................C-29 MPC755 Prefetching Considerations ...........................................C-31 Entire Instruction Cache Locking ................................................C-31 Instruction Cache Way Locking...................................................C-31 Invalidating the Instruction Cache (Even if Locked) ...................C-32
MPC755 Exceptions (Chapter 4) .......................................................................C-32 Instruction TLB Miss Exception (0x01000)..................................................C-34 Data TLB Miss for Load Exception (0x01100).............................................C-34 Data TLB Miss for Store Exception (0x01200).............................................C-35 MPC755 Memory Management (Chapter 5) .....................................................C-35 Software Table Search Resources..................................................................C-36 Software Table Search Registers ...................................................................C-37
C.7.2.1 C.7.2.2 C.7.2.3 C.7.2.4
C.7.3
Data and Instruction TLB Miss Address Registers (DMISS, IMISS) ..............................................................................C-37 Data and Instruction TLB Compare Registers (DCMP, ICMP) ......C-38 Primary and Secondary Hash Address Registers (HASH1, HASH2) ...........................................................................C-38 Required Physical Address Register (RPA).....................................C-39
Software Table Search Operation ..................................................................C-39
C.7.3.1 C.7.3.2
C.8 C.9 C.10 C.10.1 C.10.2 C.10.3
xviii
Flow for Example Exception Handlers............................................C-40 Code for Example Exception Handlers............................................C-44
MPC755 Instruction Timing (Chapter 6)...........................................................C-51 MPC755 Signal Descriptions (Chapter 7) .........................................................C-51 MPC755 System Interface Operation (Chapter 8).............................................C-51 MPC755 System Interface Overview ............................................................C-51 Address Bus Pipelining..................................................................................C-52 Bus Clocking..................................................................................................C-53
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Contents
Paragraph Number C.10.4 Title Page Number
32-Bit Data Bus Mode ...................................................................................C-53
C.10.4.1 C.10.4.2 C.10.4.3 C.10.4.4 C.10.4.5
C.11 C.11.1
Burst Ordering..................................................................................C-54 Aligned Transfers.............................................................................C-54 Misaligned Data Transfers ...............................................................C-55 Selecting D32 Mode.........................................................................C-56 Signal Relationships.........................................................................C-56
MPC755 L2 Cache Interface Operation (Chapter 9) .........................................C-58 MPC755 L2 Cache Interface Overview.........................................................C-58
C.11.1.1 C.11.1.2 C.11.1.3
C.11.2
L2 Cache Organization ....................................................................C-59 L2 Cache Control .............................................................................C-60 L2 Private Memory ..........................................................................C-60
L2 Interface Operation...................................................................................C-60
C.11.2.1 C.11.2.1.1 C.11.2.1.2 C.11.2.1.3 C.11.2.1.4 C.11.2.2
C.11.3 C.11.4
L2 Cache Operation .........................................................................C-61 L2 Cache Access Priorities ..........................................................C-61 L2 Cache Services ........................................................................C-61 L2 Cache Coherency and WIMG Bits .........................................C-62 Single-Beat Accesses to L2 Interface...........................................C-62 L2 Private Memory Operation .........................................................C-62
L2 Clocking ...................................................................................................C-64 L2 Registers ...................................................................................................C-65
C.11.4.1 C.11.4.2
C.11.5 C.11.6
L2 Cache Control Register (L2CR) .................................................C-65 L2 Private Memory Control Register (L2PM).................................C-68
L2 Address and Data Parity Signals ..............................................................C-69 L2 Cache Programming Considerations ........................................................C-70
C.11.6.1 C.11.6.2 C.11.6.3 C.11.6.4 C.11.6.5 C.11.6.6
C.11.7
Enabling and Disabling the L2 Cache..............................................C-70 L2 Cache Global Invalidation ..........................................................C-71 L2 Cache Flushing ...........................................................................C-72 Other Cache Control Instructions and Effect on L2 Cache .............C-72 Cache Control Instructions and Effect on Private Memory Operation..........................................................................................C-73 L2 Cache Testing .............................................................................C-73
L2 Cache SRAM Timing Examples ..............................................................C-76
C.11.7.1
C.11.8 C.12 C.13
Pipelined PB3 Burst SRAM.............................................................C-76
Private Memory SRAM Timing ....................................................................C-78 Power and Thermal Management (Chapter 10) .................................................C-78 Performance Monitor (Chapter 11)....................................................................C-78
MOTOROLA
Contents
xix
Contents
Paragraph Number Title Page Number
Appendix D User's Manual Revision History Index
xx
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Figures
Figure Number 1-1 1-2 1-3 1-4 1-5 1-6 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 3-1 3-2 3-3 3-4 3-5 3-6 3-7 4-1 4-2 4-3 5-1 5-2 5-3 5-4 5-5 5-6 5-7 Title Page Number
MPC750 Microprocessor Block Diagram .................................................................... 1-3 Cache Organization..................................................................................................... 1-13 System Interface.......................................................................................................... 1-16 MPC750 Microprocessor Signal Groups .................................................................... 1-18 MPC750 Microprocessor Programming Model--Registers....................................... 1-22 Pipeline Diagram ........................................................................................................ 1-34 Programming Model--MPC750 Microprocessor Registers......................................... 2-3 Instruction Address Breakpoint Register ...................................................................... 2-9 Hardware Implementation-Dependent Register 0 (HID0).......................................... 2-10 Hardware Implementation-Dependent Register 1 (HID1).......................................... 2-14 Monitor Mode Control Register 0 (MMCR0) ............................................................ 2-15 Monitor Mode Control Register 1 (MMCR1) ............................................................ 2-17 Performance Monitor Counter Registers (PMC1-PMC4).......................................... 2-17 Sampled instruction Address Registers (SIA) ............................................................ 2-21 Instruction Cache Throttling Control Register (ICTC)............................................... 2-22 Thermal Management Registers 1-2 (THRM1-THRM2) ......................................... 2-23 Thermal Management Register 3 (THRM3)............................................................... 2-24 L2 Cache Control Register (L2CR) ............................................................................ 2-25 Cache Integration .......................................................................................................... 3-2 Data Cache Organization .............................................................................................. 3-4 Instruction Cache Organization .................................................................................... 3-5 MEI Cache Coherency Protocol--State Diagram (WIM = 001).................................. 3-8 PLRU Replacement Algorithm................................................................................... 3-19 Double-Word Address Ordering--Critical Double Word First.................................. 3-23 Bus Interface Address Buffers .................................................................................... 3-31 Machine Status Save/Restore Register 0 (SRR0) ......................................................... 4-7 Machine Status Save/Restore Register 1 (SRR1) ......................................................... 4-8 Machine State Register (MSR) ..................................................................................... 4-8 MMU Conceptual Block Diagram--32-Bit Implementations...................................... 5-5 MPC750 Microprocessor IMMU Block Diagram ........................................................ 5-6 MPC750 Microprocessor DMMU Block Diagram....................................................... 5-7 Address Translation Types ........................................................................................... 5-9 General Flow of Address Translation (Real Addressing Mode and Block) ............... 5-12 General Flow of Page and Direct-Store Interface Address Translation ..................... 5-13 Segment Register and DTLB Organization ................................................................ 5-24
MOTOROLA
Figures
xxi
Figures
Figure Number 5-8 5-9 5-10 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 6-9 6-10 7-1 8-1 8-2 8-3 8-4 8-5 8-6 8-7 8-8 8-9 8-10 8-11 8-12 8-13 8-14 8-15 8-16 8-17 8-18 8-19 8-20 8-21 8-22 9-1 9-2 9-3 9-4 9-5 Title Page Number
Page Address Translation Flow--TLB Hit................................................................. 5-27 Primary Page Table Search......................................................................................... 5-30 Secondary Page Table Search Flow............................................................................ 5-31 Pipelined Execution Unit .............................................................................................. 6-4 Superscalar/Pipeline Diagram....................................................................................... 6-5 MPC750 Microprocessor Pipeline Stages .................................................................... 6-7 Instruction Flow Diagram ........................................................................................... 6-10 Instruction Timing--Cache Hit .................................................................................. 6-12 Instruction Timing--Cache Miss................................................................................ 6-15 Branch Folding............................................................................................................ 6-19 Removal of Fall-Through Branch Instruction............................................................. 6-19 Branch Completion ..................................................................................................... 6-20 Branch Instruction Timing.......................................................................................... 6-23 MPC750 Signal Groups ................................................................................................ 7-3 MPC750 Microprocessor Block Diagram .................................................................... 8-3 Timing Diagram Legend............................................................................................... 8-5 Overlapping Tenures on the MPC750 Bus for a Single-Beat Transfer ........................ 8-6 Address Bus Arbitration ............................................................................................. 8-10 Address Bus Arbitration Showing Bus Parking.......................................................... 8-11 Address Bus Transfer.................................................................................................. 8-13 Snooped Address Cycle with ARTRY ....................................................................... 8-19 Data Bus Arbitration ................................................................................................... 8-20 Normal Single-Beat Read Termination ...................................................................... 8-23 Normal Single-Beat Write Termination...................................................................... 8-23 Normal Burst Transaction........................................................................................... 8-24 Termination with DRTRY .......................................................................................... 8-25 Read Burst with TA Wait States and DRTRY............................................................ 8-25 MEI Cache Coherency Protocol--State Diagram (WIM = 001)................................ 8-27 Fastest Single-Beat Reads........................................................................................... 8-28 Fastest Single-Beat Writes.......................................................................................... 8-29 Single-Beat Reads Showing Data-Delay Controls ..................................................... 8-30 Single-Beat Writes Showing Data Delay Controls..................................................... 8-31 Burst Transfers with Data Delay Controls.................................................................. 8-32 Use of Transfer Error Acknowledge (TEA) ............................................................... 8-33 IEEE 1149.1a-1993 Compliant Boundary Scan Interface .......................................... 8-37 Data Bus Write Only Transaction............................................................................... 8-38 Typical 1-Mbyte L2 Cache Configuration.................................................................... 9-2 Burst Read-Write-Read L2 Cache Access (Flow-Through)....................................... 9-11 Burst Read-Modify-Write L2 Cache Access (Flow-Through) ................................... 9-11 Burst Read-Write-Write L2 Cache Access (Flow-Through) ...................................... 9-11 Burst Read-Write-Read L2 Cache Access (Pipelined) ............................................... 9-12
xxii
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Figures
Figure Number 9-6 9-7 9-8 9-9 9-10 10-1 11-1 11-2 11-3 11-4 C-1 C-2 C-3 C-4 C-5 C-6 C-7 C-8 C-9 C-10 C-11 C-12 C-13 C-14 C-15 C-16 C-17 C-18 C-19 C-20 C-21 Title Page Number
Burst Read-Modify-Write L2 Cache Access (Pipelined) ........................................... 9-12 Burst Read-Write-Write L2 Cache Access (Pipelined) .............................................. 9-13 Burst Read-Write-Read L2 Cache Access (Late-Write SRAM) ................................ 9-13 Burst Read-Modify-Write L2 Cache Access (Late-Write SRAM)............................. 9-14 Burst Read-Write-Write L2 Cache Access (Late-Write SRAM) ............................... 9-14 Thermal Assist Unit Block Diagram........................................................................... 10-6 Monitor Mode Control Register 0 (MMCR0) ............................................................ 11-3 Monitor Mode Control Register 1 (MMCR1) ............................................................ 11-5 Performance Monitor Counter Registers (PMC1-PMC4).......................................... 11-6 Sampled instruction Address Registers (SIA) .......................................................... 11-10 MPC755 Block Diagram...............................................................................................C-5 Programming Model--MPC755 Microprocessor Registers.......................................C-11 Processor Version Register (PVR)..............................................................................C-14 Hardware Implementation-Dependent Register 2 (HID2)..........................................C-15 Derivation of Key Bit for SRR1 .................................................................................C-34 DMISS and IMISS Registers ......................................................................................C-37 DCMP and ICMP Registers........................................................................................C-38 HASH1 and HASH2 Registers ...................................................................................C-38 Required Physical Address (RPA) Register ...............................................................C-39 Flow for Example Software Table Search Operation .................................................C-41 Check and Set R and C Bit Flow ................................................................................C-42 Page Fault Setup Flow ................................................................................................C-43 Setup for Protection Violation Exceptions .................................................................C-44 32-Bit Data Bus Mode--8-Beat Burst (No Retry Conditions)...................................C-57 32-Bit Data Bus Mode--2-Beat Burst (with DRTRY)...............................................C-57 Typical Synchronous 1-Mbyte L2 Cache System Using PB3 SRAM........................C-59 L2 Cache Control Register (L2CR) ............................................................................C-65 L2 Private Memory Control Register (L2PM)............................................................C-69 Burst Read-Read-Read L2 Cache Access (Pipelined) ................................................C-77 Burst Write-Write-Write L2 Cache Access (Pipelined) .............................................C-77 Burst Read-Write-Read L2 Cache Access (Pipelined) ...............................................C-78
MOTOROLA
Figures
xxiii
Figures
xxiv
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Tables
Table Number i ii iii 1-1 1-2 1-3 1-4 1-5 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15 2-16 2-17 2-18 2-19 2-20 2-21 2-22 2-23 2-24 2-25 2-26 2-27 Title Page Number
Acronyms and Abbreviated Terms .......................................................................... xxxvi Terminology Conventions ....................................................................................... xxxix Instruction Field Conventions ....................................................................................... xl Architecture-Defined Registers on the MPC750 (Excluding SPRs) ......................... 1-23 Architecture-Defined SPRs Implemented by the MPC750 ....................................... 1-24 MPC750-Specific Registers....................................................................................... 1-24 MPC750 Microprocessor Exception Classifications ................................................. 1-30 Exceptions and Conditions ........................................................................................ 1-30 Additional MSR Bits ................................................................................................... 2-5 Additional SRR1 Bits .................................................................................................. 2-7 Instruction Address Breakpoint Register Bit Settings ............................................... 2-10 HID0 Bit Functions ................................................................................................... 2-10 HID0[BCLK] and HID0[ECLK] CLK_OUT Configuration .................................... 2-14 HID1 Bit Functions ................................................................................................... 2-14 MMCR0 Bit Settings ................................................................................................. 2-15 MMCR1 Bit Settings ................................................................................................. 2-17 PMCn Bit Settings ..................................................................................................... 2-18 PMC1 Events--MMCR0[19-25] Select Encodings ................................................. 2-18 PMC2 Events--MMCR0[26-31] Select Encodings ................................................. 2-19 PMC3 Events--MMCR1[0-4] Select Encodings ..................................................... 2-19 PMC4 Events--MMCR1[5-9] Select Encodings ..................................................... 2-20 ICTC Bit Settings ...................................................................................................... 2-22 THRM1-THRM2 Bit Settings .................................................................................. 2-23 Valid THRM1/THRM2 States ................................................................................... 2-24 THRM3 Bit Settings.................................................................................................. 2-24 L2CR Bit Settings...................................................................................................... 2-25 Settings Caused by Hard Reset (Used at Power-On)................................................. 2-27 Floating-Point Operand Data Type Behavior ............................................................ 2-30 Floating-Point Result Data Type Behavior................................................................ 2-31 Integer Arithmetic Instructions .................................................................................. 2-39 Integer Compare Instructions .................................................................................... 2-40 Integer Logical Instructions....................................................................................... 2-40 Integer Rotate Instructions......................................................................................... 2-41 Integer Shift Instructions ........................................................................................... 2-42 Floating-Point Arithmetic Instructions ...................................................................... 2-42
MOTOROLA
Tables
xxv
Tables
Table Number 2-28 2-29 2-30 2-31 2-32 2-33 2-34 2-35 2-36 2-37 2-38 2-39 2-40 2-41 2-42 2-43 2-44 2-45 2-46 2-47 2-48 2-49 2-50 2-51 2-52 2-53 2-54 2-55 2-56 2-57 2-58 2-59 2-60 3-1 3-2 3-3 3-4 3-5 3-6 3-7 4-1 Title Page Number
Floating-Point Multiply-Add Instructions ................................................................. 2-43 Floating-Point Rounding and Conversion Instructions ............................................. 2-44 Floating-Point Compare Instructions......................................................................... 2-44 Floating-Point Status and Control Register Instructions ........................................... 2-44 Floating-Point Move Instructions .............................................................................. 2-45 Integer Load Instructions........................................................................................... 2-47 Integer Store Instructions........................................................................................... 2-48 Integer Load and Store with Byte-Reverse Instructions............................................ 2-49 Integer Load and Store Multiple Instructions............................................................ 2-50 Integer Load and Store String Instructions................................................................ 2-50 Floating-Point Load Instructions ............................................................................... 2-52 Floating-Point Store Instructions............................................................................... 2-52 Store Floating-Point Single Behavior........................................................................ 2-53 Store Floating-Point Double Behavior ...................................................................... 2-53 Branch Instructions.................................................................................................... 2-55 Condition Register Logical Instructions.................................................................... 2-55 Trap Instructions........................................................................................................ 2-55 System Linkage Instruction--UISA.......................................................................... 2-56 Move to/from Condition Register Instructions.......................................................... 2-56 Move to/from Special-Purpose Register Instructions (UISA)................................... 2-56 PowerPC Encodings .................................................................................................. 2-57 SPR Encodings for MPC750-Defined Registers (mfspr) ......................................... 2-58 Memory Synchronization Instructions--UISA ......................................................... 2-59 Move from Time Base Instruction............................................................................. 2-60 Memory Synchronization Instructions--VEA .......................................................... 2-62 User-Level Cache Instructions................................................................................... 2-63 External Control Instructions..................................................................................... 2-65 System Linkage Instructions--OEA ......................................................................... 2-65 Move to/from Machine State Register Instructions................................................... 2-66 Move to/from Special-Purpose Register Instructions (OEA) .................................... 2-66 Supervisor-Level Cache Management Instruction .................................................... 2-66 Segment Register Manipulation Instructions............................................................. 2-67 Translation Lookaside Buffer Management Instruction............................................ 2-67 MEI State Definitions .................................................................................................. 3-7 PLRU Bit Update Rules............................................................................................. 3-20 PLRU Replacement Block Selection......................................................................... 3-20 Bus Operations Caused by Cache Control Instructions (WIM = 001) ...................... 3-24 Response to Snooped Bus Transactions ................................................................... 3-26 Address/Transfer Attribute Summary........................................................................ 3-29 MEI State Transitions ................................................................................................ 3-31 MPC750 Microprocessor Exception Classifications ................................................... 4-3
xxvi
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Tables
Table Number 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 7-1 7-2 7-3 7-4 7-5 7-6 8-1 8-2 8-3 8-4 9-1 10-1 10-2 Title Page Number
Exceptions and Conditions .......................................................................................... 4-3 MPC750 Exception Priorities...................................................................................... 4-6 MSR Bit Settings ......................................................................................................... 4-8 IEEE Floating-Point Exception Mode Bits ............................................................... 4-10 MSR Setting Due to Exception ................................................................................. 4-12 System Reset Exception--Register Settings ............................................................. 4-13 HID0 Machine Check Enable Bits ............................................................................ 4-15 Machine Check Exception--Register Settings.......................................................... 4-16 Performance Monitor Interrupt Exception--Register Settings ................................. 4-21 Instruction Address Breakpoint Exception--Register Settings................................. 4-21 System Management Interrupt Exception--Register Settings .................................. 4-22 Thermal Management Interrupt Exception--Register Settings ................................ 4-23 MMU Feature Summary.............................................................................................. 5-3 Access Protection Options for Pages......................................................................... 5-10 Translation Exception Conditions ............................................................................. 5-14 Other MMU Exception Conditions for the MPC750 Processor................................ 5-15 MPC750 Microprocessor Instruction Summary--Control MMUs ........................... 5-17 MPC750 Microprocessor MMU Registers................................................................ 5-17 Table Search Operations to Update History Bits--TLB Hit Case ............................ 5-20 Model for Guaranteed R and C Bit Settings.............................................................. 5-22 Performance Effects of Memory Operand Placement .............................................. 6-25 TLB Miss Latencies................................................................................................... 6-28 Branch Instructions.................................................................................................... 6-31 System Register Instructions ..................................................................................... 6-31 Condition Register Logical Instructions.................................................................... 6-32 Integer Instructions .................................................................................................... 6-32 Floating-Point Instructions ........................................................................................ 6-34 Load and Store Instructions....................................................................................... 6-35 Transfer Type Encodings for MPC750 Bus Master..................................................... 7-9 MPC750 Snoop Hit Response ................................................................................... 7-10 Data Transfer Size ..................................................................................................... 7-12 Data Bus Lane Assignments ...................................................................................... 7-18 DP[0-7] Signal Assignments..................................................................................... 7-19 IEEE Interface Pin Descriptions................................................................................ 7-30 Transfer Size Signal Encodings................................................................................. 8-14 Burst Ordering ........................................................................................................... 8-15 Aligned Data Transfers.............................................................................................. 8-16 Misaligned Data Transfers (Four-Byte Examples) .................................................... 8-17 L2 Cache Control Register .......................................................................................... 9-4 MPC750 Microprocessor Programmable Power Modes ........................................... 10-2 THRM1 and THRM2 Bit Field Settings ................................................................... 10-7
MOTOROLA
Tables
xxvii
Tables
Table Number 10-3 10-4 10-5 11-1 11-2 11-3 11-4 11-5 11-6 11-7 11-8 A-1 A-2 A-3 A-4 A-5 A-6 A-7 A-8 A-9 A-10 A-11 A-12 A-13 A-14 A-15 A-16 A-17 A-18 A-19 A-20 A-21 A-22 A-23 A-24 A-25 A-26 A-27 A-28 A-29 A-30 Title Page Number
THRM3 Bit Field Settings......................................................................................... 10-7 Valid THRM1 and THRM2 Bit Settings ................................................................... 10-8 ICTC Bit Field Settings ........................................................................................... 10-11 Performance Monitor SPRs ....................................................................................... 11-3 MMCR0 Bit Settings ................................................................................................. 11-4 MMCR1 Bit Settings ................................................................................................. 11-5 PMCn Bit Settings ..................................................................................................... 11-6 PMC1 Events--MMCR0[19-25] Select Encodings ................................................. 11-7 PMC2 Events--MMCR0[26-31] Select Encodings ................................................. 11-7 PMC3 Events--MMCR1[0-4] Select Encodings ..................................................... 11-8 PMC4 Events--MMCR1[5-9] Select Encodings ..................................................... 11-9 Complete Instruction List Sorted by Mnemonic........................................................ A-1 Complete Instruction List Sorted by Opcode............................................................. A-7 Integer Arithmetic Instructions ................................................................................ A-13 Integer Compare Instructions................................................................................... A-14 Integer Logical Instructions ..................................................................................... A-14 Integer Rotate Instructions ....................................................................................... A-15 Integer Shift Instructions.......................................................................................... A-15 Floating-Point Arithmetic Instructions..................................................................... A-15 Floating-Point Multiply-Add Instructions ............................................................... A-16 Floating-Point Rounding and Conversion Instructions ............................................ A-16 Floating-Point Compare Instructions ....................................................................... A-16 Floating-Point Status and Control Register Instructions.......................................... A-16 Integer Load Instructions ......................................................................................... A-17 Integer Store Instructions ......................................................................................... A-17 Integer Load and Store with Byte Reverse Instructions........................................... A-18 Integer Load and Store Multiple Instructions .......................................................... A-18 Integer Load and Store String Instructions .............................................................. A-18 Memory Synchronization Instructions..................................................................... A-18 Floating-Point Load Instructions ............................................................................. A-18 Floating-Point Store Instructions ............................................................................. A-19 Floating-Point Move Instructions ............................................................................ A-19 Branch Instructions .................................................................................................. A-19 Condition Register Logical Instructions .................................................................. A-19 System Linkage Instructions .................................................................................... A-20 Trap Instructions ...................................................................................................... A-20 Processor Control Instructions ................................................................................. A-20 Cache Management Instructions .............................................................................. A-20 Segment Register Manipulation Instructions. .......................................................... A-21 Lookaside Buffer Management Instructions ............................................................ A-21 External Control Instructions ................................................................................... A-21
xxviii
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Tables
Table Number A-31 A-32 A-33 A-34 A-35 A-36 A-37 A-38 A-39 A-40 A-41 A-42 B-1 B-2 C-1 C-2 C-3 C-4 C-5 C-6 C-7 C-8 C-9 C-10 C-11 C-12 C-13 C-14 C-15 C-16 C-17 C-18 C-19 C-20 C-21 C-22 C-23 C-24 C-25 C-26 Title Page Number
I-Form ...................................................................................................................... A-22 B-Form ..................................................................................................................... A-22 SC-Form................................................................................................................... A-22 D-Form..................................................................................................................... A-22 X-Form..................................................................................................................... A-24 XL-Form .................................................................................................................. A-27 XFX-Form................................................................................................................ A-28 XFL-Form ................................................................................................................ A-28 XO-Form .................................................................................................................. A-29 A-Form..................................................................................................................... A-29 M-Form .................................................................................................................... A-30 PowerPC Instruction Set Legend ............................................................................. A-31 32-Bit Instructions Not Implemented by the MPC750 Processor...............................B-1 64-Bit Instructions Not Implemented by the MPC750 Processor...............................B-1 Document Revision History ........................................................................................C-2 Additional SPR Encodings........................................................................................C-13 Hardware Implementation Dependent Register 2 (HID2) Field Descriptions..........C-15 Translation Lookaside Buffer Management Instructions..........................................C-17 Cache Organization ...................................................................................................C-22 HID0 Bits Used to Perform Cache Locking .............................................................C-22 HID2 Bits Used to Perform Cache Locking .............................................................C-22 MSR Bits Used to Perform Cache Locking ..............................................................C-23 Example BAT Settings for Cache Locking ...............................................................C-24 MSR Bits for Disabling Exceptions..........................................................................C-25 MPC755 DWLCK[0-2] Encodings ..........................................................................C-27 Example BAT Settings for Cache Locking ...............................................................C-28 MSR Bits for Disabling Exceptions..........................................................................C-29 MPC755 IWLCK[0-2] Encodings............................................................................C-31 Software Table Search Exceptions and Conditions...................................................C-33 Instruction and Data TLB Miss Exceptions--Register Settings...............................C-33 Implementation-Specific Resources for Software Table Search Operations-- Summary..........................................................................................................C-36 DCMP and ICMP Bit Settings ..................................................................................C-38 HASH1 and HASH2 Bit Settings .............................................................................C-39 RPA Bit Settings........................................................................................................C-39 TLB Load and Store Instruction Latencies ...............................................................C-51 Voltage-Select Signal Descriptions ...........................................................................C-51 Burst Ordering...........................................................................................................C-54 Aligned Data Transfers--32-Bit Data Bus Mode .....................................................C-55 Misaligned Data Transfers Example--32-Bit Data Bus Mode.................................C-55 L2 Cache Sizes and Data RAM Organizations .........................................................C-60
MOTOROLA
Tables
xxix
Tables
Table Number C-27 C-28 C-29 C-30 Title Page Number
L2 Cache Control Register........................................................................................C-66 L2PM Bit Settings.....................................................................................................C-69 L2 SRAM Configuration...........................................................................................C-69 L2 Data Parity Signal Associations...........................................................................C-70
xxx
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
About This Book
The primary objective of this user's manual is to describe the functionality of the MPC750 RISC microprocessor family, which includes the MPC750, MPC755, MPC740 and MPC745 microprocessors. Unless noted otherwise, descriptions in this manual that refer to MPC750 apply to all members of the MPC750 family. This book is intended as a companion to the Programming Environments Manual for 32-Bit Implementations of the PowerPC Architecture (referred to as the Programming Environments Manual). NOTE: About the Companion Programming Environments Manual The MPC750 RISC Microprocessor User's Manual, which describes MPC750 features not defined by the architecture, is to be used with the Programming Environments Manual. Because the PowerPC architecture definition is flexible to support a broad range of processors, The Programming Environments Manual describes generally those features common to these processors and indicates which features are optional or may be implemented differently in the design of each processor. Note that the Programming Environments Manual describes features of the PowerPC architecture only for 32-bit implementations. Contact your sales representative for a copy of the Programming Environments Manual. This document and the Programming Environments Manual distinguish between the three levels, or programming environments, of the PowerPC architecture, which are as follows: * PowerPC user instruction set architecture (UISA)--The UISA defines the level of the architecture to which user-level software should conform. The UISA defines the base user-level instruction set, user-level registers, data types, memory conventions, and the memory and programming models seen by application programmers. PowerPC virtual environment architecture (VEA)--The VEA, which is the smallest component of the PowerPC architecture, defines additional user-level functionality that falls outside typical user-level software requirements. The VEA describes the
About This Book xxxi
*
MOTOROLA
*
memory model for an environment in which multiple processors or other devices can access external memory and defines aspects of the cache model and cache control instructions from a user-level perspective. VEA resources are particularly useful for optimizing memory accesses and for managing resources in an environment in which other processors and other devices can access external memory. Implementations that conform to the VEA also conform to the UISA but may not necessarily adhere to the OEA. PowerPC operating environment architecture (OEA)--The OEA defines supervisor-level resources typically required by an operating system. It defines the memory management model, supervisor-level registers, and the exception model. Implementations that conform to the OEA also conform to the UISA and VEA.
Note that some resources are defined more generally at one level in the architecture and more specifically at another. For example, conditions that cause a floating-point exception are defined by the UISA, but the exception mechanism itself is defined by the OEA. Because it is important to distinguish between the levels of the architecture to ensure compatibility across multiple platforms, those distinctions are shown clearly throughout this book. For ease in reference, topics in this book are presented in the same order as the Programming Environments Manual. Topics build upon one another, beginning with a description and complete summary of the MPC750 programming model (registers and instructions) and progressing to more specific, architecture-based topics regarding the cache, exception, and memory management models. As such, chapters may include information from multiple levels of the architecture. For example, the discussion of the cache model uses information from both the VEA and the OEA. The PowerPC Architecture: A Specification for a New Family of RISC Processors defines the architecture from the perspective of the three programming environments and remains the defining document for the PowerPC architecture. For information about ordering Motorola documentation, see "Suggested Reading," on page xxxiv. Information in this book is subject to change without notice, as described in the disclaimers on the title page of this book. As with any technical documentation, it is the readers' responsibility to be sure they are using the most recent version of the documentation. For updates to this document, refer to http://www.motorola.com/semiconductors.
Audience
This manual is intended for system software and hardware developers and applications programmers who want to develop products for the MPC750. It is assumed that the reader understands operating systems, microprocessor system design, basic principles of RISC processing, and details of the PowerPC architecture.
xxxii MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Organization
Following is a summary and a brief description of the major sections of this manual: * Chapter 1, "Overview," is useful for readers who want a general understanding of the features and functions of the PowerPC architecture and the MPC750. This chapter describes the flexible nature of the PowerPC architecture definition, and provides an overview of how the PowerPC architecture defines the register set, operand conventions, addressing modes, instruction set, cache model, exception model, and memory management model. Chapter 2, "Programming Model,"is useful for software engineers who need to understand the MPC750-specific registers, operand conventions, and details regarding how PowerPC instructions are implemented on the MPC750. Instructions are organized by function. Chapter 3, "L1 Instruction and Data Cache Operation," discusses the cache and memory model as implemented on the MPC750. Chapter 4, "Exceptions," describes the exception model defined in the PowerPC OEA and the specific exception model implemented on the MPC750. Chapter 5, "Memory Management," describes the MPC750's implementation of the memory management unit specifications provided by the OEA. Chapter 6, "Instruction Timing," provides information about latencies, interlocks, special situations, and various conditions to help make programming more efficient. This chapter is of special interest to software engineers and system designers. Chapter 7, "Signal Descriptions," describes signals of the MPC750. Chapter 8, "System Interface Operation," describes signal timings for various operations. It also provides information for interfacing to the MPC750. Chapter 9, "L2 Cache Interface Operation," describes the use of the MPC750 L2 cache and cache controller. Note that this feature is not supported on the MPC740 or the MPC745. Chapter 10, "Power and Thermal Management," provides information about power saving and thermal management modes for the MPC750 family. Chapter 11, "Performance Monitor," describes the operation of the performance monitor diagnostic tool incorporated in the MPC750 family. Appendix A, "PowerPC Instruction Set Listings," lists PowerPC instructions, indicating those that are not implemented by the MPC750; it also includes those that are specific to the MPC750. Separate tables are provided, listing the instructions by mnemonic, opcode, function, and form. A quick reference table contains general information for each instruction, such as the architecture level, privilege level, and form, and indicates if the instruction is 64-bit and optional. Appendix B, "Instructions Not Implemented," provides a list of the 32-bit and 64-bit PowerPC instructions that are not implemented in the MPC750.
About This Book xxxiii
*
* * * *
* * *
* * *
*
MOTOROLA
*
*
*
Appendix C, "MPC755 Embedded G3 Microprocessor," describes the differences between the MPC750 and the MPC755. The appendix also serves to identify any differences between the MPC740 and the MPC745. Appendix D, "User's Manual Revision History," provides a revision history for this book, and identifies all the major changes that were made between Revision 0 of this book and Revision 1. This manual also includes a glossary and an index.
Suggested Reading
This section lists additional reading that provides background for the information in this manual as well as general information about the PowerPC architecture.
General Information
The following documentation, available through Morgan-Kaufmann Publishers, 340 Pine Street, Sixth Floor, San Francisco, CA, provides useful information about the PowerPC architecture and computer architecture in general: * The PowerPC Architecture: A Specification for a New Family of RISC Processors, Second Edition, by International Business Machines, Inc. For updates to the specification, see http://www.austin.ibm.com/tech/ppc-chg.html. PowerPC Microprocessor Common Hardware Reference Platform: A System Architecture, by Apple Computer, Inc., International Business Machines, Inc., and Motorola, Inc. Computer Architecture: A Quantitative Approach, Second Edition, by John L. Hennessy and David A. Patterson Computer Organization and Design: The Hardware/Software Interface, Second Edition, David A. Patterson and John L. Hennessy
*
* *
Related Documentation
Motorola documentation is available from the sources listed on the back cover of this manual; the document order numbers are included in parentheses for ease in ordering: * Programming Environments Manual for 32-Bit Implementations of the PowerPC Architecture (MPEFPC32B/AD)--Describes resources defined by the PowerPC architecture. User's manuals--These books provide details about individual implementations and are intended for use with the Programming Environments Manual. Addenda/errata to user's manuals--Because some processors have follow-on parts an addendum is provided that describes the additional features and functionality changes. These addenda are intended for use with the corresponding user's manuals.
* *
xxxiv
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
*
*
*
*
*
Hardware specifications--Hardware specifications provide specific data regarding bus timing, signal behavior, and AC, DC, and thermal characteristics, as well as other design considerations. Separate hardware specifications are provided for each part described in this book. Technical summaries--Each device has a technical summary that provides an overview of its features. This document is roughly the equivalent to the overview (Chapter 1) of an implementation's user's manual. The Programmer's Reference Guide for the PowerPC Architecture: MPCPRG/D--This concise reference includes the register summary, memory control model, exception vectors, and the PowerPC instruction set. The Programmer's Pocket Reference Guide for the PowerPC Architecture: MPCPRGREF/D--This foldout card provides an overview of PowerPC registers, instructions, and exceptions for 32-bit implementations. Application notes--These short documents address specific design issues useful to programmers and engineers working with Motorola processors.
Additional literature is published as new processors become available. For a current list of documentation, refer to http://www.motorola.com/motorola.
Conventions
This document uses the following notational conventions: cleared/set mnemonics italics When a bit takes the value zero, it is said to be cleared; when it takes a value of one, it is said to be set. Instruction mnemonics are shown in lowercase bold. Italics indicate variable command parameters, for example, bcctrx. Book titles in text are set in italics Internal signals are set in italics, for example, qual BG 0x0 0b0 rA, rB rD frA, frB, frC frD REG[FIELD] Prefix to denote hexadecimal number Prefix to denote binary number Instruction syntax used to identify a source GPR Instruction syntax used to identify a destination GPR Instruction syntax used to identify a source FPR Instruction syntax used to identify a destination FPR Abbreviations for registers are shown in uppercase text. Specific bits, fields, or ranges appear in brackets. For example, MSR[LE] refers to the little-endian mode enable bit in the machine state register.
MOTOROLA
About This Book
xxxv
x x n & |
0000
In some contexts, such as signal encodings, an unitalicized x indicates a don't care. An italicized x indicates an alphanumeric variable. An italicized n indicates an numeric variable. NOT logical operator AND logical operator OR logical operator Indicates reserved bits or bit fields in a register. Although these bits can be written to as ones or zeros, they are always read as zeros.
Acronyms and Abbreviations
Table i contains acronyms and abbreviations that are used in this document.
Table i. Acronyms and Abbreviated Terms
Term BAT BIST BHT BIU BPU BTIC BSDL BUID CMOS COP CR CQ CTR DABR DAR DBAT DCMP DEC DLL DMISS DMMU Block address translation Built-in self test Branch history table Bus interface unit Branch processing unit Branch target instruction cache Boundary-scan description language Bus unit ID Complementary metal-oxide semiconductor Common on-chip processor Condition register Completion queue Count register Data address breakpoint register Data address register Data BAT Data TLB compare Decrementer register Delay-locked loop Data TLB miss address Data MMU Meaning
xxxvi
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Table i. Acronyms and Abbreviated Terms (continued)
Term DPM DSISR DTLB EA EAR ECC FIFO FPR FPSCR FPU GPR HIDn IABR IBAT ICTC IEEE IMMU IQ ITLB IU JTAG L2 L2CR LIFO LR LRU LSB lsb LSU MEI MESI MMCRn MMU MSB Dynamic power management Register used for determining the source of a DSI exception Data translation lookaside buffer Effective address External access register Error checking and correction First-in-first-out Floating-point register Floating-point status and control register Floating-point unit General-purpose register Hardware implementation-dependent register Instruction address breakpoint register Instruction BAT Instruction cache throttling control register Institute for Electrical and Electronics Engineers Instruction MMU Instruction queue Instruction translation lookaside buffer Integer unit Joint Test Action Group Secondary cache (Level 2 cache) L2 cache control register Last-in-first-out Link register Least recently used Least-significant byte Least-significant bit Load/store unit Modified/exclusive/invalid Modified/exclusive/shared/invalid--cache coherency protocol Monitor mode control registers Memory management unit Most-significant byte Meaning
MOTOROLA
About This Book
xxxvii
Table i. Acronyms and Abbreviated Terms (continued)
Term msb MSR NaN No-op OEA PID PLL PLRU PMCn POR POWER PTE PTEG PVR RAW RISC RTL RWITM RWNITM SDA SDR1 SIA SPR SRn SRU SRR0 SRR1 SRU TAU TB TBL TBU THRMn TLB Most-significant bit Machine state register Not a number No operation Operating environment architecture Processor identification tag Phase-locked loop Pseudo least recently used Performance monitor counter registers Power-on reset Performance Optimized with Enhanced RISC architecture Page table entry Page table entry group Processor version register Read-after-write Reduced instruction set computing Register transfer language Read with intent to modify Read with no intent to modify Sampled data address register Register that specifies the page table base address for virtual-to-physical address translation Sampled instruction address register Special-purpose register Segment register System register unit Machine status save/restore register 0 Machine status save/restore register 1 System register unit Thermal management assist unit Time base facility Time base lower register Time base upper register Thermal management registers Translation lookaside buffer Meaning
xxxviii
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Table i. Acronyms and Abbreviated Terms (continued)
Term TTL UIMM UISA UMMCRn UPMCn USIA VEA WAR WAW WIMG XATC XER Transistor-to-transistor logic Unsigned immediate value User instruction set architecture User monitor mode control registers User performance monitor counter registers User sampled instruction address register Virtual environment architecture Write-after-read Write-after-write Write-through/caching-inhibited/memory-coherency enforced/guarded bits Extended address transfer code Register used for indicating conditions such as carries and overflows for integer operations Meaning
Terminology Conventions
Table ii describes terminology conventions used in this manual and the equivalent terminology used in the PowerPC architecture specification.
Table ii. Terminology Conventions
The Architecture Specification Data storage interrupt (DSI) Extended mnemonics Fixed-point unit (FXU) Instruction storage interrupt (ISI) Interrupt Privileged mode (or privileged state) Problem mode (or problem state) Real address Relocation Storage (locations) Storage (the act of) Store in Store through DSI exception Simplified mnemonics Integer unit (IU) ISI exception Exception Supervisor-level privilege User-level privilege Physical address Translation Memory Access Write back Write through This Manual
Table iii describes instruction field notation used in this manual.
MOTOROLA
About This Book
xxxix
Table iii. Instruction Field Conventions
The Architecture Specification BA, BB, BT BF, BFA D DS FLM FRA, FRB, FRC, FRT, FRS FXM RA, RB, RT, RS SI U UI /, //, /// Equivalent to: crbA, crbB, crbD (respectively) crfD, crfS (respectively) d ds FM frA, frB, frC, frD, frS (respectively) CRM rA, rB, rD, rS (respectively) SIMM IMM UIMM 0...0 (shaded)
xl
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 1 Overview
This chapter provides an overview of the MPC750 microprocessor features, including a block diagram showing the major functional components. It provides information about how the MPC750 implementation complies with the PowerPC architecture definition. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor."
1.1
MPC750 Microprocessor Overview
This section describes the features and general operation of the MPC750 and provides a block diagram showing major functional units. The MPC750 is a reduced instruction set computer (RISC) CPU, which implements the PowerPC architecture. The MPC750 implements the 32-bit portion of the PowerPC architecture, which provides 32-bit effective addresses, integer data types of 8, 16, and 32 bits, and floating-point data types of 32 and 64 bits. The MPC750 is a superscalar processor that can complete two instructions simultaneously. It incorporates the following six execution units: * * * * * Floating-point unit (FPU) Branch processing unit (BPU) System register unit (SRU) Load/store unit (LSU) Two integer units (IUs): IU1 executes all integer instructions. IU2 executes all integer instructions except multiply and divide instructions.
The ability to execute several instructions in parallel and the use of simple instructions with rapid execution times yield high efficiency and throughput for MPC750-based systems. Most integer instructions execute in one clock cycle. The FPU is pipelined, the tasks it performs are broken into subtasks, implemented as three successive stages. Typically, a floating-point instruction can occupy only one of the three stages at a time, freeing the previous stage to work on the next floating-point instruction. Thus, three single-precision floating-point instructions can be in the FPU execute stage at a time. Double-precision add instructions have a three-cycle latency; double-precision multiply and multiply-add instructions have a four-cycle latency.
MOTOROLA
Chapter 1. Overview
1-1
MPC750 Microprocessor Overview
Figure 1-1 shows the parallel organization of the execution units (shaded in the diagram). The instruction unit fetches, dispatches, and predicts branch instructions. Note that this is a conceptual model that shows basic features rather than attempting to show how features are implemented physically. The MPC750 has independent on-chip, 32-Kbyte, eight-way set-associative, physically addressed caches for instructions and data and independent instruction and data memory management units (MMUs). Each MMU has a 128-entry, two-way set-associative translation lookaside buffer (DTLB and ITLB) that saves recently used page address translations. Block address translation is done through the four-entry instruction and data block address translation (IBAT and DBAT) arrays, defined by the PowerPC architecture. During block translation, effective addresses are compared simultaneously with all four BAT entries. For information about the L1 cache, see Chapter 3, "L1 Instruction and Data Cache Operation." The L2 cache is implemented with an on-chip, two-way, set-associative tag memory, and with external, synchronous SRAMs for data storage. The external SRAMs are accessed through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of synchronous SRAMs. The L2 cache interface is not implemented in the MPC740. For information about the L2 cache implementation, see Chapter 9, "L2 Cache Interface Operation." The MPC750 has a 32-bit address bus and a 64-bit data bus. Multiple devices compete for system resources through a central external arbiter. The MPC750's three-state cache-coherency protocol (MEI) supports the exclusive, modified, and invalid states, a compatible subset of the MESI (modified/exclusive/shared/invalid) four-state protocol, and it operates coherently in systems with four-state caches. The MPC750 supports single-beat and burst data transfers for memory accesses and memory-mapped I/O operations. The system interface is described in Chapter 7, "Signal Descriptions," and Chapter 8, "System Interface Operation." The MPC750 has four software-controllable power-saving modes. Three static modes, doze, nap, and sleep, progressively reduce power dissipation. When functional units are idle, a dynamic power management mode causes those units to enter a low-power mode automatically without affecting operational performance, software execution, or external hardware. The MPC750 also provides a thermal assist unit (TAU) and a way to reduce the instruction fetch rate for limiting power dissipation. Power management is described in Chapter 10, "Power and Thermal Management." The MPC750 uses an advanced CMOS process technology and is fully compatible with TTL devices.
1-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Unit Fetcher BTIC 64 Entry SRs (Shadow) IBAT Array ITLB BHT CTR LR Instruction MMU Branch Processing Unit 128-Bit (4 Instructions)
MOTOROLA
Instruction Queue (6 Word) Tags 32-Kbyte I Cache 2 Instructions Dispatch Unit 64-Bit (2 Instructions) Reservation Station GPR File Rename Buffers (6) Integer Unit 2 + CR 32-Bit System Register Unit 32-Bit 64-Bit Reservation Station Reservation Station (2 Entry) FPR File Rename Buffers (6) 64-Bit Floating-Point Unit +x/ FPSCR Load/Store Unit + (EA Calculation) Store Queue Reservation Station PA Data MMU SRs (Original) DBAT Array DTLB EA 64-Bit 60x Bus Interface Unit Instruction Fetch Queue L1 Castout Queue Tags 32-Kbyte D Cache Data Load Queue 64-Bit L2 Bus Interface Unit L2 Castout Queue L2 Controller L2CR 32-Bit Address Bus 64-Bit Data Bus 17-Bit L2 Address Bus 64-Bit L2 Data Bus L2 Tags Not in the MPC740
Additional Features * Time Base Counter/Decrementer * Clock Multiplier * JTAG/COP Interface * Thermal/Power Manage-
Reservation Station
Integer Unit 1
Figure 1-1. MPC750 Microprocessor Block Diagram
Chapter 1. Overview
+x/
32-Bit
Completion Unit
Reorder Buffer (6 Entry)
MPC750 Microprocessor Overview
1-3
MPC750 Microprocessor Features
1.2
MPC750 Microprocessor Features
This section lists features of the MPC750. The interrelationship of these features is shown in Figure 1-1.
1.2.1
*
Overview of the MPC750 Microprocessor Features
High-performance, superscalar microprocessor -- As many as four instructions can be fetched from the instruction cache per clock cycle -- As many as two instructions can be dispatched per clock -- As many as six instructions can execute per clock (including two integer instructions) -- Single-clock-cycle execution for most instructions Six independent execution units and two register files -- BPU featuring both static and dynamic branch prediction - 64-entry (16-set, four-way set-associative) branch target instruction cache (BTIC), a cache of branch instructions that have been encountered in branch/loop code sequences. If a target instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache. Typically, if a fetch access hits the BTIC, it provides the first two instructions in the target stream. - 512-entry branch history table (BHT) with two bits per entry for four levels of prediction--not-taken, strongly not-taken, taken, strongly taken - Branch instructions that do not update the count register (CTR) or link register (LR) are removed from the instruction stream. -- Two integer units (IUs) that share thirty-two GPRs for integer operands - IU1 can execute any integer instruction. - IU2 can execute all integer instructions except multiply and divide instructions (shift, rotate, arithmetic, and logical instructions). Most instructions that execute in the IU2 take one cycle to execute. The IU2 has a single-entry reservation station. -- Three-stage FPU - Fully IEEE 754-1985-compliant FPU for both single- and double-precision operations - Supports non-IEEE mode for time-critical operations - Hardware support for denormalized numbers - Single-entry reservation station
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Major features of the MPC750 are as follows:
*
1-4
MPC750 Microprocessor Features
*
*
*
- Thirty-two 64-bit FPRs for single- or double-precision operands -- Two-stage LSU - Two-entry reservation station - Single-cycle, pipelined cache access - Dedicated adder performs EA calculations - Performs alignment and precision conversion for floating-point data - Performs alignment and sign extension for integer data - Three-entry store queue - Supports both big- and little-endian modes -- SRU handles miscellaneous instructions - Executes CR logical and Move to/Move from SPR instructions (mtspr and mfspr) - Single-entry reservation station Rename buffers -- Six GPR rename buffers -- Six FPR rename buffers -- Condition register buffering supports two CR writes per clock Completion unit -- The completion unit retires an instruction from the six-entry reorder buffer (completion queue) when all instructions ahead of it have been completed, the instruction has finished execution, and no exceptions are pending. -- Guarantees sequential programming model (precise exception model) -- Monitors all dispatched instructions and retires them in order -- Tracks unresolved branches and flushes instructions from the mispredicted branch -- Retires as many as two instructions per clock Separate on-chip instruction and data caches (Harvard architecture) -- 32-Kbyte, eight-way set-associative instruction and data caches -- Pseudo least-recently-used (PLRU) replacement algorithm -- 32-byte (eight-word) cache block -- Physically indexed/physical tags. (Note that the PowerPC architecture refers to physical address space as real address space.) -- Cache write-back or write-through operation programmable on a per-page or per-block basis -- Instruction cache can provide four instructions per clock; data cache can provide two words per clock
Chapter 1. Overview 1-5
MOTOROLA
MPC750 Microprocessor Features
*
*
*
Caches can be disabled in software Caches can be locked in software Data cache coherency (MEI) maintained in hardware The critical double word is made available to the requesting unit when it is burst into the line-fill buffer. The cache is nonblocking, so it can be accessed during this operation. Level 2 (L2) cache interface (The L2 cache interface is not supported in the MPC740.) -- On-chip two-way set-associative L2 cache controller and tags -- External data SRAMs -- Support for 256-Kbyte, 512-Kbyte, and 1-Mbyte L2 caches -- 64-byte (256-Kbyte/512-Kbyte) and 128-byte (1 Mbyte) sectored line size -- Supports flow-through (register-buffer), pipelined (register-register), and pipelined late-write (register-register) synchronous burst SRAMs Separate memory management units (MMUs) for instructions and data -- 52-bit virtual address; 32-bit physical address -- Address translation for 4-Kbyte pages, variable-sized blocks, and 256-Mbyte segments -- Memory programmable as write-back/write-through, cacheable/noncacheable, and coherency enforced/coherency not enforced on a page or block basis -- Separate IBATs and DBATs (four each) also defined as SPRs -- Separate instruction and data translation lookaside buffers (TLBs) - Both TLBs are 128-entry, two-way set associative, and use LRU replacement algorithm - TLBs are hardware-reloadable (that is, the page table search is performed in hardware) Separate bus interface units for system memory and for the L2 cache -- Bus interface features include the following: - Selectable bus-to-core clock frequency ratios of 2x, 2.5x, 3x, 3.5x, 4x, 4.5x ... 8x. (2x to 8x, all half-clock multipliers in-between) - A 64-bit, split-transaction external data bus with burst transfers - Support for address pipelining and limited out-of-order bus transactions - Single-entry load queue - Single-entry instruction fetch queue - Two-entry L1 cache castout queue
-- -- -- --
1-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC750 Microprocessor Features
*
*
* *
- No-DRTRY mode eliminates the DRTRY signal from the qualified bus grant. This allows the forwarding of data during load operations to the internal core one bus cycle sooner than if the use of DRTRY is enabled. -- L2 cache interface features (which are not implemented on the MPC740) include the following: - Core-to-L2 frequency divisors of 1, 1.5, 2, 2.5, and 3 - Four-entry L2 cache castout queue in L2 cache BIU - 17-bit address bus - 64-bit data bus Multiprocessing support features include the following: -- Hardware-enforced, three-state cache coherency protocol (MEI) for data cache. -- Load/store with reservation instruction pair for atomic memory references, semaphores, and other multiprocessor operations Power and thermal management -- Three static modes, doze, nap, and sleep, progressively reduce power dissipation: - Doze--All the functional units are disabled except for the time base/decrementer registers and the bus snooping logic. - Nap--The nap mode further reduces power consumption by disabling bus snooping, leaving only the time base register and the PLL in a powered state. - Sleep--All internal functional units are disabled, after which external system logic may disable the PLL and SYSCLK. -- Thermal management facility provides software-controllable thermal management. Thermal management is performed through the use of three supervisor-level registers and an MPC750-specific thermal management exception. -- Instruction cache throttling provides control of instruction fetching to limit power consumption. Performance monitor can be used to help debug system designs and improve software efficiency. In-system testability and debugging features through JTAG boundary-scan capability
1.2.2
Instruction Flow
As shown in Figure 1-1, the MPC750 instruction unit provides centralized control of instruction flow to the execution units. The instruction unit contains a sequential fetcher, six-entry instruction queue (IQ), dispatch unit, and BPU. It determines the address of the
MOTOROLA
Chapter 1. Overview
1-7
MPC750 Microprocessor Features
next instruction to be fetched based on information from the sequential fetcher and from the BPU. See Chapter 6, "Instruction Timing," for a detailed discussion of instruction timing. The sequential fetcher loads instructions from the instruction cache into the instruction queue. The BPU extracts branch instructions from the sequential fetcher. Branch instructions that cannot be resolved immediately are predicted using either the MPC750-specific dynamic branch prediction or the architecture-defined static branch prediction. Branch instructions that do not affect the LR or CTR are removed from the instruction stream. The BPU folds branch instructions when a branch is taken (or predicted as taken); branch instructions that are not taken, or predicted as not taken, are removed from the instruction stream through the dispatch mechanism. Instructions issued beyond a predicted branch do not complete execution until the branch is resolved, preserving the programming model of sequential execution. If branch prediction is incorrect, the instruction unit flushes all predicted path instructions, and instructions are fetched from the correct path.
1.2.2.1
Instruction Queue and Dispatch Unit
The instruction queue (IQ), shown in Figure 1-1, holds as many as six instructions and loads up to four instructions from the instruction cache during a single processor clock cycle. The instruction fetcher continuously attempts to load as many instructions as there were vacancies in the IQ in the previous clock cycle. All instructions except branch instructions are dispatched to their respective execution units from the bottom two positions in the instruction queue (IQ0 and IQ1) at a maximum rate of two instructions per cycle. Reservation stations are provided for the IU1, IU2, FPU, LSU, and SRU. The dispatch unit checks for source and destination register dependencies, determines whether a position is available in the completion queue, and inhibits subsequent instruction dispatching as required. Branch instructions can be detected, decoded, and predicted from anywhere in the instruction queue. For a more detailed discussion of instruction dispatch, see Section 6.3.3, "Instruction Dispatch and Completion Considerations."
1.2.2.2
Branch Processing Unit (BPU)
The BPU receives branch instructions from the sequential fetcher and performs CR lookahead operations on conditional branches to resolve them early, achieving the effect of a zero-cycle branch in many cases. Unconditional branch instructions and conditional branch instructions in which the condition is known can be resolved immediately. For unresolved conditional branch instructions, the branch path is predicted using either the architecture-defined static branch
1-8 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC750 Microprocessor Features
prediction or the MPC750-specific dynamic branch prediction. Dynamic branch prediction is enabled if HID0[BHT] = 1. When a prediction is made, instruction fetching, dispatching, and execution continue from the predicted path, but instructions cannot complete and write back results to architected registers until the prediction is determined to be correct (resolved). When a prediction is incorrect, the instructions from the incorrect path are flushed from the processor and processing begins from the correct path. The MPC750 allows a second branch instruction to be predicted; instructions from the second predicted instruction stream can be fetched but cannot be dispatched. Dynamic prediction is implemented using a 512-entry branch history table (BHT), a cache that provides two bits per entry that together indicate four levels of prediction for a branch instruction--not-taken, strongly not-taken, taken, strongly taken. When dynamic branch prediction is disabled, the BPU uses a bit in the instruction encoding to predict the direction of the conditional branch. Therefore, when an unresolved conditional branch instruction is encountered, the MPC750 executes instructions from the predicted target stream although the results are not committed to architected registers until the conditional branch is resolved. This execution can continue until a second unresolved branch instruction is encountered. When a branch is taken (or predicted as taken), the instructions from the untaken path must be flushed and the target instruction stream must be fetched into the IQ. The BTIC is a 64-entry cache that contains the most recently used branch target instructions, typically in pairs. When an instruction fetch hits in the BTIC, the instructions arrive in the instruction queue in the next clock cycle, a clock cycle sooner than they would arrive from the instruction cache. Additional instructions arrive from the instruction cache in the next clock cycle. The BTIC reduces the number of missed opportunities to dispatch instructions and gives the processor a one-cycle head start on processing the target stream. The BPU contains an adder to compute branch target addresses and three user-control registers--the link register (LR), the count register (CTR), and the CR. The BPU calculates the return pointer for subroutine calls and saves it into the LR for certain types of branch instructions. The LR also contains the branch target address for the Branch Conditional to Link Register (bclrx) instruction. The CTR contains the branch target address for the Branch Conditional to Count Register (bcctrx) instruction. Because the LR and CTR are SPRs, their contents can be copied to or from any GPR. Because the BPU uses dedicated registers rather than GPRs or FPRs, execution of branch instructions is largely independent from execution of integer and floating-point instructions.
1.2.2.3
Completion Unit
The completion unit operates closely with the instruction unit. Instructions are fetched and dispatched in program order. At the point of dispatch, the program order is maintained by assigning each dispatched instruction a successive entry in the six-entry completion queue.
MOTOROLA
Chapter 1. Overview
1-9
MPC750 Microprocessor Features
The completion unit tracks instructions from dispatch through execution and retires them in program order from the two bottom entries in the completion queue (CQ0 and CQ1). Instructions cannot be dispatched to an execution unit unless there is a vacancy in the completion queue. Branch instructions that do not update the CTR or LR are removed from the instruction stream and do not take an entry in the completion queue. Instructions that update the CTR and LR follow the same dispatch and completion procedures as non-branch instructions, except that they are not issued to an execution unit. Completing an instruction commits execution results to architected registers (GPRs, FPRs, LR, and CTR). In-order completion ensures the correct architectural state when the MPC750 must recover from a mispredicted branch or any exception. Retiring an instruction removes it from the completion queue. For a more detailed discussion of instruction completion, see Section 6.3.3, "Instruction Dispatch and Completion Considerations."
1.2.2.4
Independent Execution Units
In addition to the BPU, the MPC750 provides the five execution units described in the following sections. 1.2.2.4.1 Integer Units (IUs)
The integer units IU1 and IU2 are shown in Figure 1-1. The IU1 can execute any integer instruction; the IU2 can execute any integer instruction except multiplication and division instructions. Each IU has a single-entry reservation station that can receive instructions from the dispatch unit and operands from the GPRs or the rename buffers. Each IU consists of three single-cycle subunits--a fast adder/comparator, a subunit for logical operations, and a subunit for performing rotates, shifts, and count-leading-zero operations. These subunits handle all one-cycle arithmetic instructions; only one subunit can execute an instruction at a time. The IU1 has a 32-bit integer multiplier/divider as well as the adder, shift, and logical units of the IU2. The multiplier supports early exit for operations that do not require full 32- x 32-bit multiplication. Each IU has a dedicated result bus (not shown in Figure 1-1) that connects to rename buffers. 1.2.2.4.2 Floating-Point Unit (FPU)
The FPU, shown in Figure 1-1, is designed such that single-precision operations require only a single pass, with a latency of three cycles. As instructions are dispatched to the FPU's reservation station, source operand data can be accessed from the FPRs or from the FPR
1-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC750 Microprocessor Features
rename buffers. Results in turn are written to the rename buffers and are made available to subsequent instructions. Instructions pass through the reservation station in dispatch order. The FPU contains a single-precision multiply-add array and the floating-point status and control register (FPSCR). The multiply-add array allows the MPC750 to efficiently implement multiply and multiply-add operations. The FPU is pipelined so that one singleor double-precision instruction can be issued per clock cycle. Thirty-two 64-bit floating-point registers are provided to support floating-point operations. Stalls due to contention for FPRs are minimized by automatic allocation of the six floating-point rename registers. The MPC750 writes the contents of the rename registers to the appropriate FPR when floating-point instructions are retired by the completion unit. The MPC750 supports all IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity) in hardware, eliminating the latency incurred by software exception routines. (Note that exception is also referred to as interrupt in the architecture specification.) 1.2.2.4.3 Load/Store Unit (LSU)
The LSU executes all load and store instructions and provides the data transfer interface between the GPRs, FPRs, and the cache/memory subsystem. The LSU calculates effective addresses, performs data alignment, and provides sequencing for load/store string and multiple instructions. Load and store instructions are issued and translated in program order; however, some memory accesses can occur out of order. Synchronizing instructions can be used to enforce strict ordering. When there are no data dependencies and the guarded bit for the page or block is cleared, a maximum of one out-of-order cacheable load operation can execute per cycle, with a two-cycle total latency on a cache hit. Data returned from the cache is held in a rename register until the completion logic commits the value to a GPR or FPR. Stores cannot be executed out of order and are held in the store queue until the completion logic signals that the store operation is to be completed to memory. The MPC750 executes store instructions with a maximum throughput of one per cycle and a three-cycle total latency to the data cache. The time required to perform the actual load or store operation depends on the processor/bus clock ratio and whether the operation involves the on-chip cache, the L2 cache, system memory, or an I/O device. 1.2.2.4.4 System Register Unit (SRU)
The SRU executes various system-level instructions, as well as condition register logical operations and move to/from special-purpose register instructions. To maintain system state, most instructions executed by the SRU are execution-serialized; that is, the instruction is held for execution in the SRU until all previously issued instructions have executed. Results from execution-serialized instructions executed by the SRU are not available or forwarded for subsequent instructions until the instruction completes.
MOTOROLA
Chapter 1. Overview
1-11
MPC750 Microprocessor Features
1.2.3
Memory Management Units (MMUs)
The MPC750's MMUs support up to 4 Petabytes (252) of virtual memory and 4 Gigabytes (232) of physical memory for instructions and data. The MMUs also control access privileges for these spaces on block and page granularities. Referenced and changed status is maintained by the processor for each page to support demand-paged virtual memory systems. The LSU calculates effective addresses for data loads and stores; the instruction unit calculates effective addresses for instruction fetching. The MMU translates the effective address to determine the correct physical address for the memory access. The MPC750 supports the following types of memory translation: * Real addressing mode--In this mode, translation is disabled by clearing bits in the machine state register (MSR): MSR[IR] for instruction fetching or MSR[DR] for data accesses. When address translation is disabled, the physical address is identical to the effective address. Page address translation--translates the page frame address for a 4-Kbyte page size Block address translation--translates the base address for blocks (128 Kbytes to 256 Mbytes)
* *
If translation is enabled, the appropriate MMU translates the higher-order bits of the effective address into physical address bits. The lower-order address bits (that are untranslated and therefore, considered both logical and physical) are directed to the on-chip caches where they form the index into the eight-way set-associative tag array. After translating the address, the MMU passes the higher-order physical address bits to the cache and the cache lookup completes. For caching-inhibited accesses or accesses that miss in the cache, the untranslated lower-order address bits are concatenated with the translated higher-order address bits; the resulting 32-bit physical address is used by the memory unit and the system interface, which accesses external memory. The TLBs store page address translations for recent memory accesses. For each access, an effective address is presented for page and block translation simultaneously. If a translation is found in both the TLB and the BAT array, the block address translation in the BAT array is used. Usually the translation is in a TLB and the physical address is readily available to the on-chip cache. When a page address translation is not in a TLB, hardware searches for one in the page table following the model defined by the PowerPC architecture. Instruction and data TLBs provide address translation in parallel with the on-chip cache access, incurring no additional time penalty in the event of a TLB hit. The MPC750's TLBs are 128-entry, two-way set-associative caches that contain instruction and data address translations. The MPC750 automatically generates a TLB search on a TLB miss.
1-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC750 Microprocessor Features
1.2.4
On-Chip Instruction and Data Caches
The MPC750 implements separate instruction and data caches. Each cache is 32-Kbyte and eight-way set associative. As defined by the PowerPC architecture, they are physically indexed. Each cache block contains eight contiguous words from memory that are loaded from an 8-word boundary (that is, bits EA[27-31] are zeros); thus, a cache block never crosses a page boundary. An entire cache block can be updated by a four-beat burst load. Misaligned accesses across a page boundary can incur a performance penalty. Caches are nonblocking, write-back caches with hardware support for reloading on cache misses. The critical double word is transferred on the first beat and is simultaneously written to the cache and forwarded to the requesting unit, minimizing stalls due to load delays. The cache being loaded is not blocked to internal accesses while the load completes. The MPC750 cache organization is shown in Figure 1-2.
128 Sets
Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7
Address Tag 0 Address Tag 1 Address Tag 2 Address Tag 3 Address Tag 4 Address Tag 5 Address Tag 6 Address Tag 7
State State State State State State State State
Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] 8 Words/Block
Figure 1-2. Cache Organization
Within one cycle, the data cache provides double-word access to the LSU. Like the instruction cache, the data cache can be invalidated all at once or on a per-cache-block basis. The data cache can be disabled and invalidated by clearing HID0[DCE] and setting HID0[DCFI]. The data cache can be locked by setting HID0[DLOCK]. To ensure cache coherency, the data cache supports the three-state MEI protocol. The data cache tags are single-ported, so a simultaneous load or store and a snoop access represent a resource collision. If a snoop hit occurs, the LSU is blocked internally for one cycle to allow the eight-word block of data to be copied to the write-back buffer.
MOTOROLA
Chapter 1. Overview
1-13
MPC750 Microprocessor Features
Within one cycle, the instruction cache provides up to four instructions to the instruction queue. The instruction cache can be invalidated entirely or on a cache-block basis. The instruction cache can be disabled and invalidated by clearing HID0[ICE] and setting HID0[ICFI]. The instruction cache can be locked by setting HID0[ILOCK]. The instruction cache supports only the valid/invalid states. The MPC750 also implements a 64-entry (16-set, four-way set-associative) branch target instruction cache (BTIC). The BTIC is a cache of branch instructions that have been encountered in branch/loop code sequences. If the target instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache. Typically the BTIC contains the first two instructions in the target stream. The BTIC can be disabled and invalidated through software. For more information and timing examples showing cache hit and cache miss latencies, see Section 6.3.2, "Instruction Fetch Timing."
1.2.5
L2 Cache Implementation (Not Supported in the MPC740)
The L2 cache is a unified cache that receives memory requests from both the L1 instruction and data caches independently. The L2 cache is implemented with an on-chip, two-way, set-associative tag memory, and with external, synchronous SRAMs for data storage. The external SRAMs are accessed through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of synchronous SRAMs. The L2 cache normally operates in write-back mode and supports system cache coherency through snooping. Depending on its size, the L2 cache is organized into 64- or 128-byte lines, which in turn are subdivided into 32-byte sectors (blocks), the unit at which cache coherency is maintained. The L2 cache controller contains the L2 cache control register (L2CR), which includes bits for enabling parity checking, setting the L2-to-processor clock ratio, and identifying the type of RAM used for the L2 cache implementation. The L2 cache controller also manages the L2 cache tag array, two-way set-associative with 4K tags per way. Each sector (32-byte cache block) has its own valid and modified status bits. Requests from the L1 cache generally result from instruction misses, data load or store misses, write-through operations, or cache management instructions. Requests from the L1 cache are looked up in the L2 tags and serviced by the L2 cache if they hit; they are forwarded to the bus interface if they miss. The L2 cache can accept multiple, simultaneous accesses. The L1 instruction cache can request an instruction at the same time that the L1 data cache is requesting one load and two store operations. The L2 cache also services snoop requests from the bus. If there are multiple pending requests to the L2 cache, snoop requests have highest priority. The next
1-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC750 Microprocessor Features
priority consists of load and store requests from the L1 data cache. The next priority consists of instruction fetch requests from the L1 instruction cache. For more information, see Chapter 9, "L2 Cache Interface Operation."
1.2.6
System Interface/Bus Interface Unit (BIU)
The address and data buses operate independently; address and data tenures of a memory access are decoupled to provide a more flexible control of memory traffic. The primary activity of the system interface is transferring data and instructions between the processor and system memory. There are two types of memory accesses: * Single-beat transfers--These memory accesses allow transfer sizes of 8, 16, 24, 32, or 64 bits in one bus clock cycle. Single-beat transactions are caused by uncacheable read and write operations that access memory directly (that is, when caching is disabled), cache-inhibited accesses, and stores in write-through mode. Four-beat burst (32 bytes) data transfers--Burst transactions, which always transfer an entire cache block (32 bytes), are initiated when an entire cache block is transferred. Because the first-level caches on the MPC750 are write-back caches, burst-read memory, burst operations are the most common memory accesses, followed by burst-write memory operations, and single-beat (noncacheable or write-through) memory read and write operations.
*
The MPC750 also supports address-only operations, variants of the burst and single-beat operations, (for example, atomic memory operations and global memory operations that are snooped), and address retry activity (for example, when a snooped read access hits a modified block in the cache). The broadcast of some address-only operations is controlled through HID0[ABE]. I/O accesses use the same protocol as memory accesses. Access to the system interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership. This arbitration mechanism is flexible, allowing the MPC750 to be integrated into systems that implement various fairness and bus parking procedures to avoid arbitration overhead. Typically, memory accesses are weakly ordered--sequences of operations, including load/store string and multiple instructions, do not necessarily complete in the order they begin--maximizing the efficiency of the bus without sacrificing data coherency. The MPC750 allows read operations to go ahead of store operations (except when a dependency exists, or in cases where a noncacheable access is performed), and provides support for a write operation to go ahead of a previously queued read data tenure (for example, letting a snoop push be enveloped between address and data tenures of a read operation). Because the MPC750 can dynamically optimize run-time ordering of load/store traffic, overall performance is improved. The system interface is specific for each microprocessor.
MOTOROLA
Chapter 1. Overview
1-15
MPC750 Microprocessor Features
The MPC750 signals are grouped as shown in Figure 1-3. Signals are provided for clocking and control of the L2 caches, as well as separate L2 address and data buses. Test and control signals provide diagnostics for selected internal circuits.
Address Arbitration Address Start Address Transfer Transfer Attribute Address Termination Clocks System Status
VDD VDD (I/O)
Data Arbitration Data Transfer Data Termination
MPC750
L2 Cache Clock/Control1 L2 Cache Address/Data1 Processor Status/Control Test and Control
1 Not supported in the MPC740
Figure 1-3. System Interface
The system interface supports address pipelining, which allows the address tenure of one transaction to overlap the data tenure of another. The extent of the pipelining depends on external arbitration and control circuitry. Similarly, the MPC750 supports split-bus transactions for systems with multiple potential bus masters--one device can have mastership of the address bus while another has mastership of the data bus. Allowing multiple bus transactions to occur simultaneously increases the available bus bandwidth for other activity. The MPC750's clocking structure supports a wide range processor-to-bus clock ratios.
1.2.7
* * *
Signals
Address arbitration signals--The MPC750 uses these signals to arbitrate for address bus mastership. Address start signals--These signals indicate that a bus master has begun a transaction on the address bus. Address transfer signals--These signals include the address bus and address parity signals. They are used to transfer the address and to ensure the integrity of the transfer. Transfer attribute signals--These signals provide information about the type of transfer, such as the transfer size and whether the transaction is bursted, write-through, or caching-inhibited. Address termination signals--These signals are used to acknowledge the end of the address phase of the transaction. They also indicate whether a condition exists that requires the address phase to be repeated.
The MPC750's signals are grouped as follows:
*
*
1-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC750 Microprocessor Features
* * *
* * *
* * *
*
Data arbitration signals--The MPC750 uses these signals to arbitrate for data bus mastership. Data transfer signals--These signals, which consist of the data bus and data parity signals, are used to transfer the data and to ensure the integrity of the transfer. Data termination signals--Data termination signals are required after each data beat in a data transfer. In a single-beat transaction, a data termination signal also indicates the end of the tenure; in burst accesses, data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat. They also indicate whether a condition exists that requires the data phase to be repeated. L2 cache clock/control signals--These signals provide clocking and control for the L2 cache. (Not supported in the MPC740.) L2 cache address/data--The MPC750 has separate address and data buses for accessing the L2 cache. (Not supported in the MPC740.) Interrupt signals--These signals include the interrupt signal, checkstop signals, and both soft reset and hard reset signals. These signals are used to generate interrupt exceptions and, under various conditions, to reset the processor. Processor status/control signals--These signals are used to set the reservation coherency bit, enable the time base, and other functions. Miscellaneous signals--These signals are used in conjunction with such resources as secondary caches and the time base facility. JTAG/COP interface signals--The common on-chip processor (COP) unit provides a serial interface to the system for performing board-level boundary scan interconnect tests. Clock signals--These signals determine the system clock frequency. These signals can also be used to synchronize multiprocessor systems. NOTE A bar over a signal name indicates that the signal is active low--for example, ARTRY (address retry) and TS (transfer start). Active-low signals are referred to as asserted (active) when they are low and negated when they are high. Signals that are not active low, such as AP[0-3] (address bus parity signals) and TT[0-4] (transfer type signals) are referred to as asserted when they are high and negated when they are low.
1.2.8
Signal Configuration
Figure 1-4 shows the MPC750's logical pin configuration. The signals are grouped by function.
MOTOROLA
Chapter 1. Overview
1-17
MPC750 Microprocessor Features
L2VDD L2AVDD BR BG ABB
1 1 1 17 64 8
Not supported in the MPC740 L2 Cache Address/ Data
Address Arbitration
L2ADDR[16-0] L2DATA[0-63] L2DP[0-7] L2CE L2WE L2CLK_OUT[A-B] L2SYNC_OUT L2SYNC_IN L2ZZ INT SMI MCP SRESET HRESET CKSTP_IN CKSTP_OUT RSRV TBEN TLBISYNC QREQ QACK SYSCLK PLL_CFG[0-3] CLK_OUT JTAG/COP Factory Test
Address Start
TS
1
1 1 2
Address Bus
A[0-31] AP[0-3] TT[0-4] TBST TSIZ[0-2] GBL WT CI AACK ARTRY DBG DBWO DBB D[0-63] DP[0-7] DBDIS TA DRTRY TEA
32 4
1 1 1
L2 Cache Clock/ Control
5 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Transfer Attributes
Interrupts/ Resets
MPC75
Address Termination
Data Arbitration
Processor Status/ Control
Data Transfer
64 8 1
1 4 1
Clock Control
1 1 1
Data Termination
5 3
Test Interface
VDD VDD (I/O) AVDD
Figure 1-4. MPC750 Microprocessor Signal Groups
Signal functionality is described in detail in Chapter 7, "Signal Descriptions," and Chapter 8, "System Interface Operation."
1.2.9
Clocking
The MPC750 requires a single system clock input, SYSCLK, that represents the bus interface frequency. Internally, the processor uses a phase-locked loop (PLL) circuit to
1-18 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC750 Microprocessor Implementation
generate a master core clock that is frequency-multiplied and phase-locked to the SYSCLK input. This core frequency is used to operate the internal circuitry. The PLL is configured by the PLL_CFG[0-3] signals, which select the multiplier that the PLL uses to multiply the SYSCLK frequency up to the internal core frequency. The feedback in the PLL guarantees that the processor clock is phase locked to the bus clock, regardless of process variations, temperature changes, or parasitic capacitances. The PLL also ensures a 50% duty cycle for the processor clock. The MPC750 supports various processor-to-bus clock frequency ratios, although not all ratios are available for all frequencies. Configuration of the processor/bus clock ratios is displayed through a MPC750-specific register, HID1. For information about supported clock frequencies, see the MPC750 hardware specifications.
1.3
MPC750 Microprocessor Implementation
The PowerPC architecture is derived from the POWER architecture (Performance Optimized with Enhanced RISC architecture). The PowerPC architecture shares the benefits of the POWER architecture optimized for single-chip implementations. The PowerPC architecture design facilitates parallel instruction execution and is scalable to take advantage of future technological gains. This section describes the PowerPC architecture in general, and specific details about the implementation of the MPC750 as a low-power, 32-bit device that implements this architecture. The structure of this section follows the organization of the user's manual; each subsection provides an overview of each chapter. * Registers and programming model--Section 1.4, "PowerPC Registers and Programming Model," describes the registers for the operating environment architecture common among processors of this family and describes the programming model. It also describes the registers that are unique to the MPC750. The information in this section is described more fully in Chapter 2, "Programming Model." Instruction set and addressing modes--Section 1.5, "Instruction Set," describes the PowerPC instruction set and addressing modes for the PowerPC operating environment architecture, and defines and describes the PowerPC instructions implemented in the MPC750. The information in this section is described more fully in Chapter 2, "Programming Model." Cache implementation--Section 1.6, "On-Chip Cache Implementation," describes the cache model that is defined generally by the virtual environment architecture. It also provides specific details about the MPC750 cache implementation. The information in this section is described more fully in Chapter 3, "L1 Instruction and Data Cache Operation." Exception model--Section 1.7, "Exception Model," describes the exception model of the PowerPC operating environment architecture and the differences in the
Chapter 1. Overview 1-19
*
*
*
MOTOROLA
MPC750 Microprocessor Implementation
*
*
*
*
*
MPC750 exception model. The information in this section is described more fully in Chapter 4, "Exceptions." Memory management--Section 1.8, "Memory Management," describes generally the conventions for memory management among the processors of this family. This section also describes the MPC750's implementation of the 32-bit PowerPC memory management specification. The information in this section is described more fully in Chapter 5, "Memory Management Instruction timing--Section 1.9, "Instruction Timing," provides a general description of the instruction timing provided by the superscalar, parallel execution supported by the PowerPC architecture and the MPC750. The information in this section is described more fully in Chapter 6, "Instruction Timing," Power management--Section 1.10, "Power Management," describes how the power management can be used to reduce power consumption when the processor, or portions of it, are idle. The information in this section is described more fully in Chapter 10, "Power and Thermal Management." Thermal management--Section 1.11, "Thermal Management," describes how the thermal management unit and its associated registers (THRM1-THRM3) and exception can be used to manage system activity in a way that prevents exceeding system and junction temperature thresholds. This is particularly useful in high-performance portable systems, which cannot use the same cooling mechanisms (such as fans) that control overheating in desktop systems. The information in this section is described more fully in Chapter 10, "Power and Thermal Management." Performance monitor--Section 1.12, "Performance Monitor," describes the performance monitor facility, which system designers can use to help bring up, debug, and optimize software performance. The information in this section is described more fully in Chapter 10, "Power and Thermal Management."
The following sections summarize the features of the MPC750, distinguishing those that are defined by the architecture and from those that are unique to the MPC750 implementation. The PowerPC architecture consists of the following layers, and adherence to the PowerPC architecture can be described in terms of which of the following levels of the architecture is implemented: * PowerPC user instruction set architecture (UISA)--Defines the base user-level instruction set, user-level registers, data types, floating-point exception model, memory models for a uniprocessor environment, and programming model for a uniprocessor environment. PowerPC virtual environment architecture (VEA)--Describes the memory model for a multiprocessor environment, defines cache control instructions, and describes other aspects of virtual environments. Implementations that conform to the VEA also adhere to the UISA, but may not necessarily adhere to the OEA.
*
1-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
PowerPC Registers and Programming Model
*
PowerPC operating environment architecture (OEA)--Defines the memory management model, supervisor-level registers, synchronization requirements, and the exception model. Implementations that conform to the OEA also adhere to the UISA and the VEA.
The PowerPC architecture allows a wide range of designs for such features as cache and system interface implementations. The MPC750 implementations support the three levels of the architecture described above. For more information about the PowerPC architecture, see Programming Environments Manual. Specific features of the MPC750 are listed in Section 1.2, "MPC750 Microprocessor Features."
1.4
PowerPC Registers and Programming Model
The PowerPC architecture defines register-to-register operations for most computational instructions. Source operands for these instructions are accessed from the registers or are provided as immediate values embedded in the instruction opcode. The three-register instruction format allows specification of a target register distinct from the two source operands. Load and store instructions transfer data between registers and memory. Processors of this family have two levels of privilege--supervisor mode of operation (typically used by the operating system) and user mode of operation (used by the application software). The programming models incorporate 32 GPRs, 32 FPRs, special-purpose registers (SPRs), and several miscellaneous registers. Each microprocessor also has its own unique set of hardware implementation-dependent (HID) registers. Having access to privileged instructions, registers, and other resources allows the operating system to control the application environment (providing virtual memory and protecting operating-system and critical machine resources). Instructions that control the state of the processor, the address translation mechanism, and supervisor registers can be executed only when the processor is operating in supervisor mode. Figure 1-5 shows all the MPC750 registers available at the user and supervisor level. The numbers to the right of the SPRs indicate the number that is used in the syntax of the instruction operands to access the register. For more information, see Chapter 2, "Programming Model."
MOTOROLA
Chapter 1. Overview
1-21
PowerPC Registers and Programming Model
SUPERVISOR MODEL--OEA Configuration Registers USER MODEL--VEA
Time Base Facility (For Reading) TBL TBR 268 TBU TBR 269 Hardware Implementation Registers1 HID0 HID1 SPR 1008 SPR 1009 Processor Version Register PVR SPR 287 Machine State Register MSR
USER MODEL--UISA
Count Register CTR XER XER Link Register LR SPR 8 Floating-Point Registers FPR0 FPR1 SPRGs SPRG0 FPR31 Condition Register CR SPRG1 SPRG2 SPRG3 SPR 1 GPR31 SPR 9 General-Purpose Registers GPR0 GPR1 Instruction BAT Registers IBAT0U IBAT0L IBAT1U IBAT1L IBAT2U IBAT2L IBAT3U IBAT3L
Memory Management Registers
Data BAT Registers SPR 528 SPR 529 SPR 530 SPR 531 SPR 532 SPR 533 SPR 534 SPR 535 DBAT0U DBAT0L DBAT1U DBAT1L DBAT2U DBAT2L DBAT3U DBAT3L SPR 536 SPR 537 SPR 538 SPR 539 SPR 540 SPR 541 SPR 542 SPR 543 SDR1 SDR1 SPR 25 SR15 Segment Registers SR0 SR1
Performance Monitor Registers (For Reading)
Performance Counters1 UPMC1 UPMC2 UPMC3 UPMC4 SPR 937 SPR 938 SPR 941 SPR 942
Exception Handling Registers
SPR 272 SPR 273 SPR 274 SPR 275 Data Address Register DAR DSISR DSISR SPR 18 SPR 19 Save and Restore Registers SRR0 SRR1 SPR 26 SPR 27
Sampled Instruction Address1 USIA Monitor SPR 939
Miscellaneous Registers
Floating-Point Status and Control Register FPSCR External Access Register EAR SPR 282 Time Base (For Writing) TBL TBU Data Address Breakpoint Register DABR SPR 1013 L2 Control Register1, 2 L2CR SPR 1017 SPR 284 SPR 285 Instruction Address Breakpoint Register1 IABR SPR 1010 Decrementer DEC SPR 22
Control1 SPR 936 SPR 940
UMMCR0 UMMCR1
Performance Monitor Registers
Performance Counters1 PMC1 PMC2 PMC3 PMC4 SPR 953 SPR 954 SPR 957 SPR 958 Sampled Instruction Address1 SIA SPR 955
Power/Thermal Management Registers
Thermal Assist Unit Registers1 THRM1 THRM2 THRM3 SPR 1020 SPR 1021 SPR 1022 Instruction Cache Throttling Control Register1 ICTC SPR 1019
Monitor Control1 MMCR0 MMCR1 SPR 952 SPR 956
1
These registers are MPC750-specific registers. They may not be supported by other processors that implement the PowerPC architecture. 2 Not supported by the MPC740.
Figure 1-5. MPC750 Microprocessor Programming Model--Registers
1-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
PowerPC Registers and Programming Model
The following tables summarize the registers implemented in the MPC750; Table 1-1 describes registers (excluding SPRs) defined by the architecture.
Table 1-1. Architecture-Defined Registers on the MPC750 (Excluding SPRs)
Register CR Level User Function The condition register (CR) consists of eight four-bit fields that reflect the results of certain operations, such as move, integer and floating-point compare, arithmetic, and logical instructions, and provide a mechanism for testing and branching. The 32 floating-point registers (FPRs) serve as the data source or destination for floating-point instructions. These 64-bit registers can hold either single- or double-precision floating-point values. The floating-point status and control register (FPSCR) contains the floating-point exception signal bits, exception summary bits, exception enable bits, and rounding control bits needed for compliance with the IEEE-754 standard. The 32 GPRs serve as the data source or destination for integer instructions.
FPRs
User
FPSCR
User
GPRs MSR
User
Supervisor The machine state register (MSR) defines the processor state. Its contents are saved when an exception is taken and restored when exception handling completes. The MPC750 implements MSR[POW], (defined by the architecture as optional), which is used to enable the power management feature. The MPC750-specific MSR[PM] bit is used to mark a process for the performance monitor.
SR0-SR Supervisor The sixteen 32-bit segment registers (SRs) define the 4-Gbyte space as sixteen 256-Mbyte 15 segments. The MPC750 implements segment registers as two arrays--a main array for data accesses and a shadow array for instruction accesses; see Figure 1-1. Loading a segment entry with the Move to Segment Register (mtsr) instruction loads both arrays. The mfsr instruction reads the master register, shown as part of the data MMU in Figure 1-1.
The OEA defines numerous special-purpose registers that serve a variety of functions, such as providing controls, indicating status, configuring the processor, and performing special operations. During normal execution, a program can access the registers, shown in Figure 1-5, depending on the program's access privilege (supervisor or user, determined by the privilege-level (PR) bit in the MSR). GPRs and FPRs are accessed through operands that are part of the instructions. Access to registers can be explicit (that is, through the use of specific instructions for that purpose such as Move to Special-Purpose Register (mtspr) and Move from Special-Purpose Register (mfspr) instructions) or implicit, as the part of the execution of an instruction. Some registers can be accessed both explicitly and implicitly. In the MPC750, all SPRs are 32 bits wide. Table 1-2 describes the architecture-defined SPRs implemented by the MPC750. The Programming Environments Manual describes these registers in detail, including bit descriptions. Section 2.1.1, "Register Set," describes how these registers are implemented in the MPC750. In particular, this section describes which features the PowerPC architecture defines as optional are implemented on the MPC750.
MOTOROLA
Chapter 1. Overview
1-23
PowerPC Registers and Programming Model
Table 1-2. Architecture-Defined SPRs Implemented by the MPC750
Register LR BATs Level User Supervisor Function The link register (LR) can be used to provide the branch target address and to hold the return address after branch and link instructions. The architecture defines 16 block address translation registers (BATs), which operate in pairs. There are four pairs of data BATs (DBATs) and four pairs of instruction BATs (IBATs). BATs are used to define and configure blocks of memory. The count register (CTR) is decremented and tested by branch-and-count instructions. The optional data address breakpoint register (DABR) supports the data address breakpoint facility. The data address register (DAR) holds the address of an access after an alignment or DSI exception. The decrementer register (DEC) is a 32-bit decrementing counter that provides a way to schedule decrementer exceptions. The DSISR defines the cause of data access and alignment exceptions. The external access register (EAR) controls access to the external access facility through the External Control In Word Indexed (eciwx) and External Control Out Word Indexed (ecowx) instructions. The processor version register (PVR) is a read-only register that identifies the processor. SDR1 specifies the page table format used in virtual-to-physical page address translation. The machine status save/restore register 0 (SRR0) saves the address used for restarting an interrupted program when a Return from Interrupt (rfi) instruction executes. The machine status save/restore register 1 (SRR1) is used to save machine status on exceptions and to restore machine status when an rfi instruction is executed. SPRG0-SPRG3 are provided for operating system use.
CTR DABR DAR DEC DSISR EAR
User Supervisor User Supervisor User Supervisor
PVR SDR1 SRR0 SRR1 SPRG0-S PRG3 TB
Supervisor Supervisor Supervisor Supervisor Supervisor
User: read The time base register (TB) is a 64-bit register that maintains the time of day and operates Supervisor: interval timers. The TB consists of two 32-bit fields--time base upper (TBU) and time base read/write lower (TBL). User The XER contains the summary overflow bit, integer carry bit, overflow bit, and a field specifying the number of bytes to be transferred by a Load String Word Indexed (lswx) or Store String Word Indexed (stswx) instruction.
XER
Table 1-3 describes the supervisor-level SPRs in the MPC750 that are not defined by the PowerPC architecture. Section 2.1.2, "MPC750-Specific Registers," gives detailed descriptions of these registers, including bit descriptions.
Table 1-3. MPC750-Specific Registers
Register HID0 Level Function
Supervisor The hardware implementation-dependent register 0 (HID0) provides checkstop enables and other functions.
1-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set
Table 1-3. MPC750-Specific Registers (continued)
Register HID1 IABR Level Function
Supervisor The hardware implementation-dependent register 1 (HID1) allows software to read the configuration of the PLL configuration signals. Supervisor The instruction address breakpoint register (IABR) supports instruction address breakpoint exceptions. It can hold an address to compare with instruction addresses in the IQ. An address match causes an instruction address breakpoint exception. Supervisor The instruction cache-throttling control register (ICTC) has bits for controlling the interval at which instructions are fetched into the instruction buffer in the instruction unit. This helps control the MPC750's overall junction temperature. Supervisor The L2 cache control register (L2CR) is used to configure and operate the L2 cache. It has bits for enabling parity checking, setting the L2-to-processor clock ratio, and identifying the type of RAM used for the L2 cache implementation. (The L2 cache feature is not supported in the MPC740.)
ICTC
L2CR
MMCR0-MM Supervisor The monitor mode control registers (MMCR0-MMCR1) are used to enable various CR1 performance monitoring interrupt functions. UMMCR0-UMMCR1 provide user-level read access to MMCR0-MMCR1. PMC1-PMC 4 SIA Supervisor The performance monitor counter registers (PMC1-PMC4) are used to count specified events. UPMC1-UPMC4 provide user-level read access to these registers. Supervisor The sampled instruction address register (SIA) holds the EA of an instruction executing at or around the time the processor signals the performance monitor interrupt condition. The USIA register provides user-level read access to the SIA. Supervisor THRM1 and THRM2 provide a way to compare the junction temperature against two user-provided thresholds. The thermal assist unit (TAU) can be operated so that the thermal sensor output is compared to only one threshold, selected in THRM1 or THRM2. Supervisor THRM3 is used to enable the TAU and to control the output sample time. User User User The user monitor mode control registers (UMMCR0-UMMCR1) provide user-level read access to MMCR0-MMCR1. The user performance monitor counter registers (UPMC1-UPMC4) provide user-level read access to PMC1-PMC4. The user sampled instruction address register (USIA) provides user-level read access to the SIA register.
THRM1, THRM2 THRM3 UMMCR0-U MMCR1 UPMC1-UP MC4 USIA
1.5
Instruction Set
All PowerPC instructions are encoded as single-word (32-bit) opcodes. Instruction formats are consistent among all instruction types, permitting efficient decoding to occur in parallel with operand accesses. This fixed instruction length and consistent format greatly simplifies instruction pipelining. For more information, see Chapter 2, "Programming Model."
1.5.1
PowerPC Instruction Set
The PowerPC instructions are divided into the following categories:
MOTOROLA Chapter 1. Overview 1-25
Instruction Set
*
*
*
*
*
*
Integer instructions--These include computational and logical instructions. -- Integer arithmetic instructions -- Integer compare instructions -- Integer logical instructions -- Integer rotate and shift instructions Floating-point instructions--These include floating-point computational instructions, as well as instructions that affect the FPSCR. -- Floating-point arithmetic instructions -- Floating-point multiply/add instructions -- Floating-point rounding and conversion instructions -- Floating-point compare instructions -- Floating-point status and control instructions Load/store instructions--These include integer and floating-point load and store instructions. -- Integer load and store instructions -- Integer load and store multiple instructions -- Floating-point load and store -- Primitives used to construct atomic memory operations (lwarx and stwcx. instructions) Flow control instructions--These include branching instructions, condition register logical instructions, trap instructions, and other instructions that affect the instruction flow. -- Branch and trap instructions -- Condition register logical instructions Processor control instructions--These instructions are used for synchronizing memory accesses and management of caches, TLBs, and the segment registers. -- Move to/from SPR instructions -- Move to/from MSR -- Synchronize -- Instruction synchronize -- Order loads and stores Memory control instructions--These instructions provide control of caches, TLBs, and SRs. -- Supervisor-level cache management instructions -- User-level cache instructions -- Segment register manipulation instructions
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
1-26
On-Chip Cache Implementation
-- Translation lookaside buffer management instructions This grouping does not indicate the execution unit that executes a particular instruction or group of instructions. Integer instructions operate on byte, half-word, and word operands. Floating-point instructions operate on single-precision (one word) and double-precision (one double word) floating-point operands. The PowerPC architecture uses instructions that are four bytes long and word-aligned. It provides for byte, half-word, and word operand loads and stores between memory and a set of 32 GPRs. It also provides for word and double-word operand loads and stores between memory and a set of 32 floating-point registers (FPRs). Computational instructions do not modify memory. To use a memory operand in a computation and then modify the same or another memory location, the memory contents must be loaded into a register, modified, and then written back to the target location with distinct instructions. Processors in this family follow the program flow when they are in the normal execution state. However, the flow of instructions can be interrupted directly by the execution of an instruction or by an asynchronous event. Either kind of exception may cause one of several components of the system software to be invoked. Effective address computations for both data and instruction accesses use 32-bit unsigned binary arithmetic. A carry from bit 0 is ignored in 32-bit implementations.
1.5.2
* *
MPC750 Microprocessor Instruction Set
The MPC750 provides hardware support for all 32-bit PowerPC instructions. The MPC750 implements the following instructions optional to the PowerPC architecture: -- External Control In Word Indexed (eciwx) -- External Control Out Word Indexed (ecowx) -- Floating Select (fsel) -- Floating Reciprocal Estimate Single-Precision (fres) -- Floating Reciprocal Square Root Estimate (frsqrte) -- Store Floating-Point as Integer Word (stfiwx)
The MPC750 instruction set is defined as follows:
1.6
On-Chip Cache Implementation
The following subsections describe the PowerPC architecture's treatment of cache in general, and the MPC750-specific implementation, respectively. A detailed description of
MOTOROLA
Chapter 1. Overview
1-27
Exception Model
the MPC750 cache implementation is provided in Chapter 3, "L1 Instruction and Data Cache Operation."
1.6.1
PowerPC Cache Model
The PowerPC architecture does not define hardware aspects of cache implementations. For example, processors can have unified caches, separate instruction and data caches (Harvard architecture), or no cache at all. The microprocessors control the following memory access modes on a page or block basis: * * * Write-back/write-through mode Caching-inhibited mode Memory coherency
The caches are physically addressed, and the data cache can operate in either write-back or write-through mode as specified by the PowerPC architecture. The PowerPC architecture defines the term `cache block' as the cacheable unit. The VEA and OEA define cache management instructions a programmer can use to affect cache contents.
1.6.2
MPC750 Microprocessor Cache Implementation
The MPC750 cache implementation is described in Section 1.2.4, "On-Chip Instruction and Data Caches," and Section 1.2.5, "L2 Cache Implementation (Not Supported in the MPC740)." The BPU also contains a 64-entry BTIC that provides immediate access to cached target instructions. For more information, see Section 1.2.2.2, "Branch Processing Unit (BPU)."
1.7
Exception Model
The following sections describe the PowerPC exception model and the MPC750 implementation. A detailed description of the MPC750 exception model is provided in Chapter 4, "Exceptions."
1.7.1
PowerPC Exception Model
The PowerPC exception mechanism allows the processor to interrupt the instruction flow to handle certain situations caused by external signals, errors, or unusual conditions arising from the instruction execution. When exceptions occur, information about the state of the processor is saved to certain registers and the processor begins execution at an address (exception vector) predetermined for each exception. Exception processing occurs in supervisor mode.
1-28
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Model
Although multiple exception conditions can map to a single exception vector, a more specific condition may be determined by examining a register associated with the exception--for example, the DSISR and the FPSCR. Additionally, some exception conditions can be enabled or disabled explicitly by software. The PowerPC architecture requires that exceptions be handled in program order; therefore, although a particular implementation may recognize exception conditions out of order, they are handled in order. When an instruction-caused exception is recognized, any unexecuted instructions that appear earlier in the instruction stream, including any that are undispatched, are required to complete before the exception is taken, and any exceptions those instructions cause must also be handled first. Likewise, asynchronous, precise exceptions are recognized when they occur, but are not handled until the instructions currently in the completion queue successfully retire or generate an exception, and the completion queue is emptied. Unless a catastrophic condition causes a system reset or machine check exception, only one exception is handled at a time. For example, if one instruction encounters multiple exception conditions, those conditions are handled sequentially. After the exception handler handles an exception, the instruction processing continues until the next exception condition is encountered. Recognizing and handling exception conditions sequentially guarantees that exceptions are recoverable. When an exception is taken, information about the processor state before the exception was taken is saved in SRR0 and SRR1. Exception handlers should save the information stored in SRR0 and SRR1 early to prevent the program state from being lost due to a system reset and machine check exception or to an instruction-caused exception in the exception handler, and before enabling external interrupts. The PowerPC architecture supports four types of exceptions: * Synchronous, precise--These are caused by instructions. All instruction-caused exceptions are handled precisely; that is, the machine state at the time the exception occurs is known and can be completely restored. This means that (excluding the trap and system call exceptions) the address of the faulting instruction is provided to the exception handler and that neither the faulting instruction nor subsequent instructions in the code stream will complete execution before the exception is taken. Once the exception is processed, execution resumes at the address of the faulting instruction (or at an alternate address provided by the exception handler). When an exception is taken due to a trap or system call instruction, execution resumes at an address provided by the handler. Synchronous, imprecise--The PowerPC architecture defines two imprecise floating-point exception modes, recoverable and nonrecoverable. Even though the MPC750 provides a means to enable the imprecise modes, it implements these modes identically to the precise mode (that is, enabled floating-point exceptions are always precise).
*
MOTOROLA
Chapter 1. Overview
1-29
Exception Model
*
*
Asynchronous, maskable--The PowerPC architecture defines external and decrementer interrupts as maskable, asynchronous exceptions. When these exceptions occur, their handling is postponed until the next instruction, and any exceptions associated with that instruction, completes execution. If no instructions are in the execution units, the exception is taken immediately upon determination of the correct restart address (for loading SRR0). As shown in Table 1-4, the MPC750 implements additional asynchronous, maskable exceptions. Asynchronous, nonmaskable--There are two nonmaskable asynchronous exceptions: system reset and the machine check exception. These exceptions may not be recoverable, or may provide a limited degree of recoverability. Exceptions report recoverability through the MSR[RI] bit.
1.7.2
MPC750 Microprocessor Exception Implementation
Table 1-4. MPC750 Microprocessor Exception Classifications
The MPC750 exception classes described above are shown in Table 1-4.
Synchronous/Asynchronous Precise/Imprecise Asynchronous, nonmaskable Asynchronous, maskable Imprecise Precise
Exception Type Machine check, system reset External, decrementer, system management, performance monitor, and thermal management interrupts Instruction-caused exceptions
Synchronous
Precise
Although exceptions have other characteristics, such as priority and recoverability, Table 1-4 describes categories of exceptions the MPC750 handles uniquely. Table 1-4 includes no synchronous imprecise exceptions; although the PowerPC architecture supports imprecise handling of floating-point exceptions, the MPC750 implements these exception modes precisely. Table 1-5 lists MPC750 exceptions and conditions that cause them. Exceptions specific to the MPC750 are indicated.
Table 1-5. Exceptions and Conditions
Exception Type Reserved System reset Machine check Vector Offset (hex) 00000 00100 00200 -- Assertion of either HRESET or SRESET or at power-on reset Assertion of TEA during a data bus transaction, assertion of MCP, or an address, data, or L2 bus parity error. MSR[ME] must be set. As specified in the PowerPC architecture. For TLB misses on load, store, or cache operations, a DSI exception occurs if a page fault occurs. As defined by the PowerPC architecture. Causing Conditions
DSI
00300
ISI
00400
1-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Management
Table 1-5. Exceptions and Conditions (continued)
Exception Type External interrupt Alignment Vector Offset (hex) 00500 00600 Causing Conditions MSR[EE] = 1 and INT is asserted. * *A floating-point load/store, stmw, stwcx, lmw, lwarx, eciwx or ecowx instruction operand is not word-aligned. * *A multiple/string load/store operation is attempted in little-endian mode. * *The operand of dcbz is in memory that is write-through-required or caching-inhibited or the cache is disabled As defined by the PowerPC architecture. As defined by the PowerPC architecture.
Program Floating-point unavailable Decrementer
00700 00800
00900
As defined by the PowerPC architecture, when the most significant bit of the DEC register changes from 0 to 1 and MSR[EE] = 1.
Reserved System call Trace
00A00-00BFF -- 00C00 00D00 Execution of the System Call (sc) instruction. MSR[SE] = 1 or a branch instruction completes and MSR[BE] = 1. Unlike the architecture definition, isync does not cause a trace exception The MPC750 does not generate an exception to this vector. Other processors may use this vector for floating-point assist exceptions.
Reserved
00E00
Reserved
00E10-00EFF -- The limit specified in a PMC register is reached and MMCR0[ENINT] = 1 IABR[0-29] matches EA[0-29] of the next instruction to complete, IABR[TE] matches MSR[IR], and IABR[BE] = 1. MSR[EE] = 1 and SMI is asserted.
Performance monitor 1 00F00 Instruction address breakpoint1 System management interrupt1 Reserved Thermal management interrupt1 Reserved Note:
1
01300
01400
01500-016FF -- 01700 Thermal management is enabled, the junction temperature exceeds the threshold specified in THRM1 or THRM2, and MSR[EE] = 1.
01800-02FFF --
MPC750-specific
1.8
Memory Management
The following subsections describe the memory management features of the PowerPC architecture, and the MPC750 implementation, respectively. A detailed description of the MPC750 MMU implementation is provided in Chapter 5, "Memory Management."
MOTOROLA
Chapter 1. Overview
1-31
Memory Management
1.8.1
PowerPC Memory Management Model
The primary functions of the MMU are to translate logical (effective) addresses to physical addresses for memory accesses and to provide access protection on blocks and pages of memory. There are two types of accesses generated by the MPC750 that require address translation--instruction accesses, and data accesses to memory generated by load, store, and cache control instructions. The PowerPC architecture defines different resources for 32- and 64-bit processors; the MPC750 implements the 32-bit memory management model. The memory-management model provides 4 Gbytes of logical address space accessible to supervisor and user programs with a 4-Kbyte page size and 256-Mbyte segment size. BAT block sizes range from 128 Kbyte to 256 Mbyte and are software selectable. In addition, it defines an interim 52-bit virtual address and hashed page tables for generating 32-bit physical addresses. The architecture also provides independent four-entry BAT arrays for instructions and data that maintain address translations for blocks of memory. These entries define blocks that can vary from 128 Kbytes to 256 Mbytes. The BAT arrays are maintained by system software. The PowerPC MMU and exception model support demand-paged virtual memory. Virtual memory management permits execution of programs larger than the size of physical memory; demand-paged implies that individual pages are loaded into physical memory from system memory only when they are first accessed by an executing program. The hashed page table is a variable-sized data structure that defines the mapping between virtual page numbers and physical page numbers. The page table size is a power of 2, and its starting address is a multiple of its size. The page table contains a number of page table entry groups (PTEGs). A PTEG contains eight page table entries (PTEs) of eight bytes each; therefore, each PTEG is 64 bytes long. PTEG addresses are entry points for table search operations. Setting MSR[IR] enables instruction address translations and MSR[DR] enables data address translations. If the bit is cleared, the respective effective address is the same as the physical address.
1.8.2
MPC750 Microprocessor Memory Management Implementation
The MPC750 implements separate MMUs for instructions and data. It implements a copy of the segment registers in the instruction MMU, however, read and write accesses (mfsr and mtsr) are handled through the segment registers implemented as part of the data MMU. The MPC750 MMU is described in Section 1.2.3, "Memory Management Units (MMUs)."
1-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Timing
The R (referenced) bit is updated in the PTE in memory (if necessary) during a table search due to a TLB miss. Updates to the C (changed) bit are treated like TLB misses. A complete table search is performed and the entire TLB entry is rewritten to update the C bit.
1.9
Instruction Timing
The MPC750 is a pipelined, superscalar processor. A pipelined processor is one in which instruction processing is divided into discrete stages, allowing work to be done on different instructions in each stage. For example, after an instruction completes one stage, it can pass on to the next stage leaving the previous stage available to the subsequent instruction. This improves overall instruction throughput. A superscalar processor is one that issues multiple independent instructions into separate execution units, allowing instructions to execute in parallel. The MPC750 has six independent execution units, two for integer instructions, and one each for floating-point instructions, branch instructions, load/store instructions, and system register instructions. Having separate GPRs and FPRs allows integer, floating-point calculations, and load and store operations to occur simultaneously without interference. Additionally, rename buffers are provided to allow operations to post execution results for use by subsequent instructions without committing them to the architected FPRs and GPRs. As shown in Figure 1-6, the common pipeline of the MPC750 has four stages through which all instructions must pass--fetch, decode/dispatch, execute, and complete/write back. Some instructions occupy multiple stages simultaneously and some individual execution units have additional stages. For example, the floating-point pipeline consists of three stages through which all floating-point instructions must pass.
MOTOROLA
Chapter 1. Overview
1-33
Instruction Timing
Maximum four-instruction fetch per clock cycle
Fetch BPU Dispatch
Maximum three-instruction dispatch per clock cycle (includes one branch instruction) Execute Stage
FPU1 FPU2 SRU FPU3 IU1 IU2 LSU1 LSU2
Complete (Write-Back)
Maximum two-instruction pletion per clock cycle
com-
Figure 1-6. Pipeline Diagram
Note that Figure 1-6 does not show features, such as reservation stations and rename buffers that reduce stalls and improve instruction throughput. The instruction pipeline in the MPC750 has four major pipeline stages, described as follows: * The fetch pipeline stage primarily involves retrieving instructions from the memory system and determining the location of the next instruction fetch. The BPU decodes branches during the fetch stage and removes those that do not update CTR or LR from the instruction stream. The dispatch stage is responsible for decoding the instructions supplied by the instruction fetch stage and determining which instructions can be dispatched in the current cycle. If source operands for the instruction are available, they are read from the appropriate register file or rename register to the execute pipeline stage. If a source operand is not available, dispatch provides a tag that indicates which rename register will supply the operand when it becomes available. At the end of the dispatch stage, the dispatched instructions and their operands are latched by the appropriate execution unit. Instructions executed by the IUs, FPU, SRU, and LSU are dispatched from the bottom two positions in the instruction queue. In a single clock cycle, a maximum of two instructions can be dispatched to these execution units in any combination. When an instruction is dispatched, it is assigned a position in the six-entry completion queue. A branch instruction can be issued on the same clock cycle for a maximum three-instruction dispatch.
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
*
*
1-34
Power Management
*
*
During the execute pipeline stage, each execution unit that has an executable instruction executes the selected instruction (perhaps over multiple cycles), writes the instruction's result into the appropriate rename register, and notifies the completion stage that the instruction has finished execution. In the case of an internal exception, the execution unit reports the exception to the completion pipeline stage and (except for the FPU) discontinues instruction execution until the exception is handled. The exception is not signaled until that instruction is the next to be completed. Execution of most floating-point instructions is pipelined within the FPU allowing up to three instructions to be executing in the FPU concurrently. The FPU stages are multiply, add, and round-convert. Execution of most load/store instructions is also pipelined. The load/store unit has two pipeline stages. The first stage is for effective address calculation and MMU translation and the second stage is for accessing the data in the cache. The complete pipeline stage maintains the correct architectural machine state and transfers execution results from the rename registers to the GPRs and FPRs (and CTR and LR, for some instructions) as instructions are retired. As with dispatching instructions from the instruction queue, instructions are retired from the two bottom positions in the completion queue. If completion logic detects an instruction causing an exception, all following instructions are cancelled, their execution results in rename registers are discarded, and instructions are fetched from the appropriate exception vector.
Because the PowerPC architecture can be applied to such a wide variety of implementations, instruction timing varies among processors of this family. For a detailed discussion of instruction timing with examples and a table of latencies for each execution unit, see Chapter 6, "Instruction Timing."
1.10 Power Management
The MPC750 provides four power modes, selectable by setting the appropriate control bits in the MSR and HID0 registers. The four power modes are as follows: * Full-power--This is the default power state of the MPC750. The MPC750 is fully powered and the internal functional units are operating at the full processor clock speed. If the dynamic power management mode is enabled, functional units that are idle will automatically enter a low-power state without affecting performance, software execution, or external hardware. Doze--All the functional units of the MPC750 are disabled except for the time base/decrementer registers and the bus snooping logic. When the processor is in doze mode, an external asynchronous interrupt, a system management interrupt, a decrementer exception, a hard or soft reset, or machine check brings the MPC750 into the full-power state. The MPC750 in doze mode maintains the PLL in a fully
*
MOTOROLA
Chapter 1. Overview
1-35
Thermal Management
*
*
powered state and locked to the system external clock input (SYSCLK) so a transition to the full-power state takes only a few processor clock cycles. Nap--The nap mode further reduces power consumption by disabling bus snooping, leaving only the time base register and the PLL in a powered state. The MPC750 returns to the full-power state upon receipt of an external asynchronous interrupt, a system management interrupt, a decrementer exception, a hard or soft reset, or a machine check input (MCP). A return to full-power state from a nap state takes only a few processor clock cycles. When the processor is in nap mode, if QACK is negated, the processor is put in doze mode to support snooping. Sleep--Sleep mode minimizes power consumption by disabling all internal functional units, after which external system logic may disable the PLL and SYSCLK. Returning the MPC750 to the full-power state requires the enabling of the PLL and SYSCLK, followed by the assertion of an external asynchronous interrupt, a system management interrupt, a hard or soft reset, or a machine check input (MCP) signal after the time required to relock the PLL.
Chapter 10, "Power and Thermal Management," provides information about power saving and thermal management modes for the MPC750.
1.11 Thermal Management
The MPC750's thermal assist unit (TAU) provides a way to control heat dissipation. This ability is particularly useful in portable computers, which, due to power consumption and size limitations, cannot use desktop cooling solutions such as fans. Therefore, better heat sink designs coupled with intelligent thermal management is of critical importance for high performance portable systems. Primarily, the thermal management system monitors and regulates the system's operating temperature. For example, if the temperature is about to exceed a set limit, the system can be made to slow down or even suspend operations temporarily in order to lower the temperature. The thermal management facility also ensures that the processor's junction temperature does not exceed the operating specification. To avoid the inaccuracies that arise from measuring junction temperature with an external thermal sensor, the MPC750's on-chip thermal sensor and logic tightly couples the thermal management implementation. The TAU consists of a thermal sensor, digital-to-analog convertor, comparator, control logic, and the dedicated SPRs described in Section 1.4, "PowerPC Registers and Programming Model." The TAU does the following: * * * Compares the junction temperature against user-programmable thresholds Generates a thermal management interrupt if the temperature crosses the threshold Enables the user to estimate the junction temperature by way of a software successive approximation routine
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
1-36
Performance Monitor
The TAU is controlled through the privileged mtspr/mfspr instructions to the three SPRs provided for configuring and controlling the sensor control logic, which function as follows: * THRM1 and THRM2 provide the ability to compare the junction temperature against two user-provided thresholds. Having dual thresholds gives the thermal management software finer control of the junction temperature. In single threshold mode, the thermal sensor output is compared to only one threshold in either THRM1 or THRM2. THRM3 is used to enable the TAU and to control the comparator output sample time. The thermal management logic manages the thermal management interrupt generation and time multiplexed comparisons in the dual threshold mode as well as other control functions.
*
Instruction cache throttling provides control of the MPC750's overall junction temperature by determining the interval at which instructions are fetched. This feature is accessed through the ICTC register. Chapter 10, "Power and Thermal Management," provides information about power saving and thermal management modes for the MPC750.
1.12 Performance Monitor
The MPC750 incorporates a performance monitor facility that system designers can use to help bring up, debug, and optimize software performance. The performance monitor counts events during execution of code, relating to dispatch, execution, completion, and memory accesses. The performance monitor incorporates several registers that can be read and written to by supervisor-level software. User-level versions of these registers provide read-only access for user-level applications. These registers are described in Section 1.4, "PowerPC Registers and Programming Model." Performance monitor control registers, MMCR0 or MMCR1, can be used to specify which events are to be counted and the conditions for which a performance monitoring interrupt is taken. Additionally, the sampled instruction address register, SIA (USIA), holds the address of the first instruction to complete after the counter overflowed. Attempting to write to a user-read-only performance monitor register causes a program exception, regardless of the MSR[PR] setting. When a performance monitoring interrupt occurs, program execution continues from vector offset 0x00F00. Chapter 11, "Performance Monitor," describes the operation of the performance monitor diagnostic tool incorporated in the MPC750.
MOTOROLA
Chapter 1. Overview
1-37
Performance Monitor
1-38
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 2 Programming Model
This chapter describes the MPC750 programming model, emphasizing those features specific to the MPC750 processor and summarizing those that are common to the processors that implement the PowerPC architecture. It consists of three major sections, which describe the following: * * * Registers implemented in the MPC750 Operand conventions The MPC750 instruction set
For detailed information about architecture-defined features, see the Programming Environments Manual. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor."
2.1
The MPC750 Processor Register Set
This section describes the registers implemented in the MPC750. It includes an overview of registers defined by the PowerPC architecture, highlighting differences in how these registers are implemented in the MPC750, and a detailed description of MPC750-specific registers. Full descriptions of the architecture-defined register set are provided in Chapter 2, "PowerPC Register Set," in the Programming Environments Manual. Registers are defined at all three levels of the PowerPC architecture--user instruction set architecture (UISA), virtual environment architecture (VEA), and operating environment architecture (OEA). The PowerPC architecture defines register-to-register operations for all computational instructions. Source data for these instructions are accessed from the on-chip registers or are provided as immediate values embedded in the opcode. The three-register instruction format allows specification of a target register distinct from the two source registers, thus preserving the original data for use by other instructions and reducing the number of instructions required for certain operations. Data is transferred between memory and registers with explicit load and store instructions only.
MOTOROLA
Chapter 2. Programming Model
2-1
The MPC750 Processor Register Set
2.1.1
Register Set
The PowerPC UISA registers are user-level. General-purpose registers (GPRs) and floating-point registers (FPRs) are accessed through instruction operands. Access to registers can be explicit (by using instructions for that purpose such as Move to Special-Purpose Register (mtspr) and Move from Special-Purpose Register (mfspr) instructions) or implicit as part of the execution of an instruction. Some registers are accessed both explicitly and implicitly. The registers implemented on the MPC750 are shown in Figure 2-1. The number to the right of the special-purpose registers (SPRs) indicates the number that is used in the syntax of the instruction operands to access the register (for example, the number used to access the integer exception register (XER) is SPR 1). These registers can be accessed using the mtspr and mfspr instructions.
2-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
SUPERVISOR MODEL--OEA Configuration Registers USER MODEL--VEA
Time Base Facility (For Reading) TBL TBR 268 TBU TBR 269 Hardware Implementation Registers1 HID0 HID1 SPR 1008 SPR 1009 Processor Version Register PVR SPR 287 Machine State Register MSR
USER MODEL--UISA
Count Register CTR XER XER Link Register LR SPR 8 Floating-Point Registers FPR0 FPR1 SPRGs SPRG0 FPR31 Condition Register CR SPRG1 SPRG2 SPRG3 SPR 1 GPR31 SPR 9 General-Purpose Registers GPR0 GPR1 Instruction BAT Registers IBAT0U IBAT0L IBAT1U IBAT1L IBAT2U IBAT2L IBAT3U IBAT3L
Memory Management Registers
Data BAT Registers SPR 528 SPR 529 SPR 530 SPR 531 SPR 532 SPR 533 SPR 534 SPR 535 DBAT0U DBAT0L DBAT1U DBAT1L DBAT2U DBAT2L DBAT3U DBAT3L SPR 536 SPR 537 SPR 538 SPR 539 SPR 540 SPR 541 SPR 542 SPR 543 SDR1 SDR1 SPR 25 SR15 Segment Registers SR0 SR1
Performance Monitor Registers (For Reading)
Performance Counters1 UPMC1 UPMC2 UPMC3 UPMC4 SPR 937 SPR 938 SPR 941 SPR 942
Exception Handling Registers
SPR 272 SPR 273 SPR 274 SPR 275 Data Address Register DAR DSISR DSISR SPR 18 SPR 19 Save and Restore Registers SRR0 SRR1 SPR 26 SPR 27
Sampled Instruction Address1 USIA Monitor SPR 939
Miscellaneous Registers
Floating-Point Status and Control Register FPSCR External Access Register EAR SPR 282 Time Base (For Writing) TBL TBU Data Address Breakpoint Register DABR SPR 1013 L2 Control Register1, 2 L2CR SPR 1017 SPR 284 SPR 285 Instruction Address Breakpoint Register1 IABR SPR 1010 Decrementer DEC SPR 22
Control1 SPR 936 SPR 940
UMMCR0 UMMCR1
Performance Monitor Registers
Performance Counters1 PMC1 PMC2 PMC3 PMC4 SPR 953 SPR 954 SPR 957 SPR 958 Sampled Instruction Address1 SIA SPR 955
Power/Thermal Management Registers
Thermal Assist Unit Registers1 THRM1 THRM2 THRM3 SPR 1020 SPR 1021 SPR 1022 Instruction Cache Throttling Control Register1 ICTC SPR 1019
Monitor Control1 MMCR0 MMCR1 SPR 952 SPR 956
1
These registers are MPC750-specific registers. They may not be supported by other processors that implement the PowerPC architecture. 2 Not supported by the MPC740.
Figure 2-1. Programming Model--MPC750 Microprocessor Registers
MOTOROLA
Chapter 2. Programming Model
2-3
The MPC750 Processor Register Set
Implementation Note--The MPC750 fully decodes the SPR field of the instruction. If the SPR specified is undefined, the illegal instruction program exception occurs. The user-level registers are described as follows: * User-level registers (UISA)--The user-level registers can be accessed by all software with either user or supervisor privileges. They include the following: -- General-purpose registers (GPRs). The thirty-two GPRs (GPR0-GPR31) serve as data source or destination registers for integer instructions and provide data for generating addresses. See "General Purpose Registers (GPRs)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. -- Floating-point registers (FPRs). The thirty-two FPRs (FPR0-FPR31) serve as the data source or destination for all floating-point instructions. See "Floating-Point Registers (FPRs)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. -- Condition register (CR). The 32-bit CR consists of eight 4-bit fields, CR0-CR7, that reflect results of certain arithmetic operations and provide a mechanism for testing and branching. See "Condition Register (CR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. -- Floating-point status and control register (FPSCR). The FPSCR contains all floating-point exception signal bits, exception summary bits, exception enable bits, and rounding control bits needed for compliance with the IEEE 754 standard. See "Floating-Point Status and Control Register (FPSCR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. The remaining user-level registers are SPRs. Note that the PowerPC architecture provides a separate mechanism for accessing SPRs (the mtspr and mfspr instructions). These instructions are commonly used to explicitly access certain registers, while other SPRs may be more typically accessed as the side effect of executing other instructions. -- Integer exception register (XER). The XER indicates overflow and carries for integer operations. See "XER Register (XER)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. Implementation Note--To allow emulation of the lscbx instruction defined by the POWER architecture, XER[16-23] is implemented so that they can be read with mfspr[XER] and written with mtxer[XER] instructions. -- Link register (LR). The LR provides the branch target address for the Branch Conditional to Link Register (bclrx) instruction, and can be used to hold the logical address of the instruction that follows a branch and link instruction, typically used for linking to subroutines. See "Link Register (LR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. -- Count register (CTR). The CTR holds a loop count that can be decremented during execution of appropriately coded branch instructions. The CTR can also
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
2-4
The MPC750 Processor Register Set
*
*
provide the branch target address for the Branch Conditional to Count Register (bcctrx) instruction. See "Count Register (CTR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. User-level registers (VEA)--The PowerPC VEA defines the time base facility (TB), which consists of two 32-bit registers--time base upper (TBU) and time base lower (TBL). The time base registers can be written to only by supervisor-level instructions but can be read by both user- and supervisor-level software. For more information, see "PowerPC VEA Register Set--Time Base," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. Supervisor-level registers (OEA)--The OEA defines the registers an operating system uses for memory management, configuration, exception handling, and other operating system functions. The OEA defines the following supervisor-level registers for 32-bit implementations: -- Configuration registers - Machine state register (MSR). The MSR defines the state of the processor. The MSR can be modified by the Move to Machine State Register (mtmsr), System Call (sc), and Return from Exception (rfi) instructions. It can be read by the Move from Machine State Register (mfmsr) instruction. When an exception is taken, the contents of the MSR are saved to the machine status save/restore register 1 (SRR1), which is described below. See "Machine State Register (MSR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. Implementation Note--Table 2-1 describes MSR bits the MPC750 implements that are not required by the PowerPC architecture.
Table 2-1. Additional MSR Bits
Bit 13
Name POW
Description Power management enable. Optional to the PowerPC architecture. 0 Power management is disabled. 1 Power management is enabled. The processor can enter a power-saving mode when additional conditions are present. The mode chosen is determined by the DOZE, NAP, and SLEEP bits in the hardware implementation-dependent register 0 (HID0), described in Table 2-4. Performance monitor marked mode. This bit is specific to the MPC750, and is defined as reserved by the PowerPC architecture. See Chapter 11, "Performance Monitor." 0 Process is not a marked process. 1 Process is a marked process.
29
PM
Note that setting MSR[EE] masks not only the architecture-defined external interrupt and decrementer exceptions but also the MPC750-specific system management, performance monitor, and thermal management exceptions. - Processor version register (PVR). This register is a read-only register that identifies the version (model) and revision level of the processor. For more
MOTOROLA
Chapter 2. Programming Model
2-5
The MPC750 Processor Register Set
information, see "Processor Version Register (PVR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. Implementation Note--The processor version number is 0x0008 for the MPC750. The processor revision level starts at 0x0100 and is updated for each silicon revision. -- Memory management registers - Block-address translation (BAT) registers. The PowerPC OEA includes an array of block address translation registers that can be used to specify four blocks of instruction space and four blocks of data space. The BAT registers are implemented in pairs--four pairs of instruction BATs (IBAT0U-IBAT3U and IBAT0L-IBAT3L) and four pairs of data BATs (DBAT0U-DBAT3U and DBAT0L-DBAT3L). Figure 2-1 lists the SPR numbers for the BAT registers. For more information, see "BAT Registers," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. Because BAT upper and lower words are loaded separately, software must ensure that BAT translations are correct during the time that both BAT entries are being loaded. The MPC750 implements the G bit in the IBAT registers; however, attempting to execute code from an IBAT area with G = 1 causes an ISI exception. This complies with the revision of the architecture described in the Programming Environments Manual. - SDR1. The SDR1 register specifies the page table base address used in virtual-to-physical address translation. See "SDR1," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual." - Segment registers (SR). The PowerPC OEA defines sixteen 32-bit segment registers (SR0-SR15). Note that the SRs are implemented on 32-bit implementations only. The fields in the segment register are interpreted differently depending on the value of bit 0. See "Segment Registers," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. Note that the MPC750 implements separate memory management units (MMUs) for instruction and data. It associates the architecture-defined SRs with the data MMU (DMMU). It reflects the values of the SRs in separate, so-called `shadow' segment registers in the instruction MMU (IMMU). -- Exception-handling registers - Data address register (DAR). After a DSI or an alignment exception, DAR is set to the effective address (EA) generated by the faulting instruction. See "Data Address Register (DAR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. - SPRG0-SPRG3. The SPRG0-SPRG3 registers are provided for operating system use. See "SPRG0-SPRG3," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information.
2-6 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
The MPC750 Processor Register Set
- DSISR. The DSISR register defines the cause of DSI and alignment exceptions. See "DSISR," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. - Machine status save/restore register 0 (SRR0). The SRR0 register is used to save the address of the instruction at which execution continues when rfi executes at the end of an exception handler routine. See "Machine Status Save/Restore Register 0 (SRR0)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. - Machine status save/restore register 1 (SRR1). The SRR1 register is used to save machine status on exceptions and to restore machine status when rfi executes. See "Machine Status Save/Restore Register 1 (SRR1)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. Implementation Note--When a machine check exception occurs, the MPC750 sets one or more error bits in SRR1. Table 2-2 describes SRR1 bits the MPC750 implements that are not required by the PowerPC architecture.
Table 2-2. Additional SRR1 Bits
Bit 11 12 13 14 15 Name L2DP MCPIN TEA DP AP Description Set by a data parity error on the L2 bus. The MPC740 does not implement the L2 cache interface. Set by the assertion of MCP Set by a TEA assertion on the 60x bus Set by a data parity error on the 60x bus Set by an address parity error on the 60x bus
-- Miscellaneous registers - Time base (TB). The TB is a 64-bit structure provided for maintaining the time of day and operating interval timers. The TB consists of two 32-bit registers--time base upper (TBU) and time base lower (TBL). The time base registers can be written to only by supervisor-level software, but can be read by both user- and supervisor-level software. See "Time Base Facility (TB)--OEA," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. - Decrementer register (DEC). This register is a 32-bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay; the frequency is a subdivision of the processor clock. See "Decrementer Register (DEC)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. Implementation Note--In the MPC750 the decrementer register is decremented and the time base is incremented at a speed that is one-fourth the speed of the bus clock.
MOTOROLA Chapter 2. Programming Model 2-7
The MPC750 Processor Register Set
*
- Data address breakpoint register (DABR)--This optional register is used to cause a breakpoint exception if a specified data address is encountered. See "Data Address Breakpoint Register (DABR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. - External access register (EAR). This optional register is used in conjunction with eciwx and ecowx. Note that the EAR register and the eciwx and ecowx instructions are optional in the PowerPC architecture and may not be supported in all processors that implement the OEA. See "External Access Register (EAR)," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for more information. MPC750-specific registers--The PowerPC architecture allows implementationspecific SPRs. Those incorporated in the MPC750 are described as follows. Note that in the MPC750, these registers are all supervisor-level registers. -- Instruction address breakpoint register (IABR)--This register can be used to cause a breakpoint exception if a specified instruction address is encountered. -- Hardware implementation-dependent register 0 (HID0)--This register controls various functions, such as enabling checkstop conditions, and locking, enabling, and invalidating the instruction and data caches. -- Hardware implementation-dependent register 1 (HID1)--This register reflects the state of PLL_CFG[0-3] clock signals. -- The L2 cache control register (L2CR) is used to configure and operate the L2 cache. It includes bits for enabling parity checking, setting the L2-to-processor clock ratio, and identifying the type of RAM used for the L2 cache implementation. (Not supported in the MPC740.) -- Performance monitor registers. The following registers are used to define and count events for use by the performance monitor: - The performance monitor counter registers (PMC1-PMC4) are used to record the number of times a certain event has occurred. UPMC1-UPMC4 provide user-level read access to these registers. - The monitor mode control registers (MMCR0-MMCR1) are used to enable various performance monitor interrupt functions. UMMCR0-UMMCR1 provide user-level read access to these registers. - The sampled instruction address register (SIA) contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition. USIA provides user-level read access to the SIA. - The MPC750 does not implement the sampled data address register (SDA) or the user-level, read-only USDA registers. However, for compatibility with processors that do, those registers can be written to by boot code without causing an exception. SDA is SPR 959; USDA is SPR 943.
2-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
-- The instruction cache throttling control register (ICTC) has bits for enabling the instruction cache throttling feature and for controlling the interval at which instructions are forwarded to the instruction buffer in the fetch unit. This provides control over the processor's overall junction temperature. -- Thermal management registers (THRM1, THRM2, and THRM3). Used to enable and set thresholds for the thermal management facility. - THRM1 and THRM2 provide the ability to compare the junction temperature against two user-provided thresholds. The dual thresholds allow the thermal management software differing degrees of action in lowering the junction temperature. The TAU can be also operated in a single threshold mode in which the thermal sensor output is compared to only one threshold in either THRM1 or THRM2. - THRM3 is used to enable the thermal management assist unit (TAU) and to control the comparator output sample time. Note that while it is not guaranteed that the implementation of MPC750-specific registers is consistent among processors of this family, other processors may implement similar or identical registers.
2.1.2
MPC750-Specific Registers
This section describes registers that are defined for the MPC750 but are not included in the PowerPC architecture.
2.1.2.1
Instruction Address Breakpoint Register (IABR)
The address breakpoint register (IABR), shown in Figure 2-2, supports the instruction address breakpoint exception. When this exception is enabled, instruction fetch addresses are compared with an effective address stored in the IABR. If the word specified in the IABR is fetched, the instruction breakpoint handler is invoked. The instruction that triggers the breakpoint does not execute before the handler is invoked. For more information, see Section 4.5.14, "Instruction Address Breakpoint Exception (0x01300)." The IABR can be accessed with mtspr and mfspr using the SPR1010.
Address 0 BE TE 29 30 31
Figure 2-2. Instruction Address Breakpoint Register
The IABR bits are described in Table 2-3.
MOTOROLA
Chapter 2. Programming Model
2-9
The MPC750 Processor Register Set
Table 2-3. Instruction Address Breakpoint Register Bit Settings
Bits Name Description
0-29 Address Word address to be compared 30 31 BE TE Breakpoint enabled. Setting this bit indicates that breakpoint checking is to be done. Translation enabled. An IABR match is signaled if this bit matches MSR[IR].
2.1.2.2
Hardware Implementation-Dependent Register 0
The hardware implementation-dependent register 0 (HID0) controls the state of several functions within the MPC750. The HID0 register is shown in Figure 2-3.
DLOCK EMCP BCLK ECLK DOZE SLEEP ILOCK Reserved NOOPTI
DBP EBA EBD 0 1 2 3 4
0 5 6
PAR 7 8
NAP 9 10
DPM 11
0 12
0 13
0 NHR ICE DCE 14 15 16 17 18 19
ICFI DCFI SPD IFEM SGE DCFA BTIC 20 21 22 23 24 25 26
0 ABE BHT 27 28 29
0 30 31
Figure 2-3. Hardware Implementation-Dependent Register 0 (HID0)
The HID0 bits are described in Table 2-4.
Table 2-4. HID0 Bit Functions
Bits 0 Name EMCP Function Enable MCP. The primary purpose of this bit is to mask out further machine check exceptions caused by assertion of MCP, similar to how MSR[EE] can mask external interrupts. 0 Masks MCP. Asserting MCP does not generate a machine check exception or a checkstop. 1 Asserting MCP causes checkstop if MSR[ME] = 0 or a machine check exception if ME = 1. Disable 60x bus address and data parity generation. 0 The system generates address and data parity. 1 Parity generation is disabled and parity signals are driven to 0 during bus operations. When parity generation is disabled, all parity checking should also be disabled and parity signals need not be connected. Enable/disable 60x bus address parity checking 0 Prevents address parity checking. 1 Allows a address parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if MSR[ME] = 1. EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. Enable 60x bus data parity checking 0 Parity checking is disabled. 1 Allows a data parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if MSR[ME] = 1. EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. CLK_OUT output enable and clock type selection. Used in conjunction with HID0[ECLK] and the HRESET signal to configure CLK_OUT. See Table 2-5. Not used. Defined as EICE on some earlier processors.
1
DBP
2
EBA
3
EBD
4 5
BCLK --
2-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
Table 2-4. HID0 Bit Functions (continued)
Bits 6 7 Name ECLK PAR Function CLK_OUT output enable and clock type selection. Used in conjunction with HID0[BCLK] and the HRESET signal to configure CLK_OUT. See Table 2-5. Disable precharge of ARTRY. 0 Precharge of ARTRY enabled 1 Alters bus protocol slightly by preventing the processor from driving ARTRY to high (negated) state. If this is done, the system must restore the signals to the high state. Doze mode enable. Operates in conjunction with MSR[POW]. 0 Doze mode disabled. 1 Doze mode enabled. Doze mode is invoked by setting MSR[POW] while this bit is set. In doze mode, the PLL, time base, and snooping remain active. Nap mode enable. Operates in conjunction with MSR[POW]. 0 Nap mode disabled. 1 Nap mode enabled. Doze mode is invoked by setting MSR[POW] while this bit is set. In nap mode, the PLL and the time base remain active.
8
DOZE
9
NAP
10
SLEEP Sleep mode enable. Operates in conjunction with MSR[POW]. 0 Sleep mode disabled. 1 Sleep mode enabled. Sleep mode is invoked by setting MSR[POW] while this bit is set. QREQ is asserted to indicate that the processor is ready to enter sleep mode. If the system logic determines that the processor may enter sleep mode, the quiesce acknowledge signal, QACK, is asserted back to the processor. Once QACK assertion is detected, the processor enters sleep mode after several processor clocks. At this point, the system logic may turn off the PLL by first configuring PLL_CFG[0-3] to PLL bypass mode, then disabling SYSCLK. DPM Dynamic power management enable. 0 Dynamic power management is disabled. 1 Functional units enter a low-power mode automatically if the unit is idle. This does not affect operational performance and is transparent to software or any external hardware. Not used Not hard reset (software-use only)--Helps software distinguish a hard reset from a soft reset. 0 A hard reset occurred if software had previously set this bit. 1 A hard reset has not occurred. If software sets this bit after a hard reset, when a reset occurs and this bit remains set, software can tell it was a soft reset. Instruction cache enable 0 The instruction cache is neither accessed nor updated. All pages are accessed as if they were marked cache-inhibited (WIM = xlx). Potential cache accesses from the bus (snoop and cache operations) are ignored. In the disabled state for the L1 caches, the cache tag state bits are ignored and all accesses are propagated to the L2 cache or bus as single-beat transactions. For those transactions, however, CI reflects the original state determined by address translation regardless of cache disabled status. ICE is zero at power-up. 1 The instruction cache is enabled Data cache enable 0 The data cache is neither accessed nor updated. All pages are accessed as if they were marked cache-inhibited (WIM = xlx). Potential cache accesses from the bus (snoop and cache operations) are ignored. In the disabled state for the L1 caches, the cache tag state bits are ignored and all accesses are propagated to the L2 cache or bus as single-beat transactions. For those transactions, however, CI reflects the original state determined by address translation regardless of cache disabled status. DCE is zero at power-up. 1 The data cache is enabled.
11
12-14 15
-- NHR
16
ICE
17
DCE
MOTOROLA
Chapter 2. Programming Model
2-11
The MPC750 Processor Register Set
Table 2-4. HID0 Bit Functions (continued)
Bits 18 Name ILOCK Function Instruction cache lock 0 Normal operation 1 Instruction cache is locked. A locked cache supplies data normally on a hit, but are treated as a cache-inhibited transaction on a miss. On a miss, the transaction to the bus or the L2 cache is single-beat, however, CI still reflects the original state as determined by address translation independent of cache locked or disabled status. To prevent locking during a cache access, an isync instruction must precede the setting of ILOCK.
19
DLOCK Data cache lock. 0 Normal operation 1 Data cache is locked. A locked cache supplies data normally on a hit but is treated as a cache-inhibited transaction on a miss. On a miss, the transaction to the bus or the L2 cache is single-beat, however, CI still reflects the original state as determined by address translation independent of cache locked or disabled status. A snoop hit to a locked L1 data cache performs as if the cache were not locked. A cache block invalidated by a snoop remains invalid until the cache is unlocked. To prevent locking during a cache access, a sync instruction must precede the setting of DLOCK. ICFI Instruction cache flash invalidate 0 The instruction cache is not invalidated. The bit is cleared when the invalidation operation begins (usually the next cycle after the write operation to the register). The instruction cache must be enabled for the invalidation to occur. 1 An invalidate operation is issued that marks the state of each instruction cache block as invalid without writing back modified cache blocks to memory. Cache access is blocked during this time. Bus accesses to the cache are signaled as a miss during invalidate-all operations. Setting ICFI clears all the valid bits of the blocks and the PLRU bits to point to way L0 of each set. Once the L1 flash invalidate bits are set through a mtspr operations, hardware automatically resets these bits in the next cycle (provided that the corresponding cache enable bits are set in HID0). Note that in the MPC603e processors, the proper use of the ICFI and DCFI bits was to set them and clear them in two consecutive mtspr operations. Software that already has this sequence of operations does not need to be changed to run on the MPC750. Data cache flash invalidate 0 The data cache is not invalidated. The bit is cleared when the invalidation operation begins (usually the next cycle after the write operation to the register). The data cache must be enabled for the invalidation to occur. 1 An invalidate operation is issued that marks the state of each data cache block as invalid without writing back modified cache blocks to memory. Cache access is blocked during this time. Bus accesses to the cache are signaled as a miss during invalidate-all operations. Setting DCFI clears all the valid bits of the blocks and the PLRU bits to point to way L0 of each set. Once the L1 flash invalidate bits are set through a mtspr operations, hardware automatically resets these bits in the next cycle (provided that the corresponding cache enable bits are set in HID0). Setting this bit clears all the valid bits of the blocks and the PLRU bits to point to way L0 of each set. Note that in the MPC603e processors, the proper use of the ICFI and DCFI bits was to set them and clear them in two consecutive mtspr operations. Software that already has this sequence of operations does not need to be changed to run on the MPC750. Speculative cache access disable 0 Speculative bus accesses to nonguarded space (G = 0) from both the instruction and data caches is enabled 1 Speculative bus accesses to nonguarded space in both caches is disabled Enable M bit on bus for instruction fetches. 0 M bit not reflected on bus for instruction fetches. Instruction fetches are treated as nonglobal on the bus 1 Instruction fetches reflect the M bit from the WIM settings.
20
21
DCFI
22
SPD
23
IFEM
2-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
Table 2-4. HID0 Bit Functions (continued)
Bits 24 Name SGE Function Store gathering enable 0 Store gathering is disabled 1 Integer store gathering is performed for write-through to nonguarded space or for cache-inhibited stores to nonguarded space for 4-byte, word-aligned stores. The LSU combines stores to form a double word that is sent out on the 60x bus as a single-beat operation. Stores are gathered only if successive, eligible stores, are queued and pending. Store gathering is performed regardless of address order or endian mode. Data cache flush assist. (Force data cache to ignore invalid sets on miss replacement selection.) 0 The data cache flush assist facility is disabled 1 The miss replacement algorithm ignores invalid entries and follows the replacement sequence defined by the PLRU bits. This reduces the series of uniquely addressed load or dcbz instructions to eight per set. The bit should be set just before beginning a cache flush routine and should be cleared when the series of instructions is complete. BTIC enable. Used to enable use of the 64-entry branch instruction cache. 0 The BTIC contents are invalidated and the BTIC behaves as if it were empty. New entries cannot be added until the BTIC is enabled. 1 The BTIC is enabled and new entries can be added. Not used. Defined as FBIOB on earlier 603-type processors. Address broadcast enable--controls whether certain address-only operations (such as cache operations, eieio, and sync) are broadcast on the 60x bus. 0 Address-only operations affect only local L1 and L2 caches and are not broadcast. 1 Address-only operations are broadcast on the 60x bus.Affected instructions are eieio, sync, dcbi, dcbf, and dcbst. A sync instruction completes only after a successful broadcast. Execution of eieio causes a broadcast that may be used to prevent any external devices, such as a bus bridge chip, from store gathering. Note that dcbz (with M = 1, coherency required) always broadcasts on the 60x bus regardless of the setting of this bit. An icbi is never broadcast. No cache operations, except dcbz, are snooped by the MPC750 regardless of whether ABE is set. Bus activity caused by these instructions results directly from performing the operation on the MPC750 cache. Branch history table enable 0 BHT disabled. The MPC750 uses static branch prediction as defined by the PowerPC architecture (UISA) for those branch instructions the BHT would have otherwise used to predict (that is, those that use the CR as the only mechanism to determine direction). For more information on static branch prediction, see "Conditional Branch Control," in Chapter 4 of The Programming Environments Manual. 1 Allows the use of the 512-entry branch history table (BHT). The BHT is disabled at power-on reset. All entries are set to weakly, not-taken. Not used
25
DCFA
26
BTIC
27 28
-- ABE
29
BHT
30 31
--
NOOPTI No-op the data cache touch instructions. 0 The dcbt and dcbtst instructions are enabled. 1 The dcbt and dcbtst instructions are no-oped globally.
Table 2-5 shows how HID0[BCLK], HID0[ECLK], and HRESET are used to configure CLK_OUT. See Section 7.2.11.2, "Clock Out (CLK_OUT)--Output," for more information.
MOTOROLA
Chapter 2. Programming Model
2-13
The MPC750 Processor Register Set
Table 2-5. HID0[BCLK] and HID0[ECLK] CLK_OUT Configuration
HRESET Asserted Negated Negated Negated Negated HID0[ECLK] x 0 0 1 1 HID0[BCLK] x 0 1 0 1 Bus High impedance Bus/ 2 Core Bus CLK_OUT
HID0 can be accessed with mtspr and mfspr using SPR1008.
2.1.2.3
Hardware Implementation-Dependent Register 1
The hardware implementation-dependent register 1 (HID1) reflects the state of the PLL_CFG[0-3] signals. The HID1 bits are shown in Figure 2-4.
Reserved PC0PC1 PC2 PC3 0 0 1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31
Figure 2-4. Hardware Implementation-Dependent Register 1 (HID1)
The HID1 bits are described in Table 2-6.
Table 2-6. HID1 Bit Functions
Bit(s) 0 1 2 3 4-31 Name PC0 PC1 PC2 PC3 -- PLL configuration bit 0 (read-only) PLL configuration bit 1 (read-only) PLL configuration bit 2 (read-only) PLL configuration bit 3 (read-only) Reserved Description
Note: The clock configuration bits reflect the state of the PLL_CFG[0-3] signals.
HID1 can be accessed with mtspr and mfspr using SPR 1009.
2.1.2.4
Performance Monitor Registers
This section describes the registers used by the performance monitor, which is described in Chapter 11, "Performance Monitor."
2-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
2.1.2.4.1
Monitor Mode Control Register 0 (MMCR0)
The monitor mode control register 0 (MMCR0), shown in Figure 2-5, is a 32-bit SPR provided to specify events to be counted and recorded. The MMCR0 can be accessed only in supervisor mode. User-level software can read the contents of MMCR0 by issuing an mfspr instruction to UMMCR0, described in Section 2.1.2.4.2, "User Monitor Mode Control Register 0 (UMMCR0)."
INTONBITTRANS RTCSELECT DISCOUNT ENINT DIS DP DU DMS DMR 0 1 2 3 4 5 6 7 8 9 10 PMC2INTCONTROL PMC1INTCONTROL THRESHOLD 15 16 17 18 19 PMCTRIGGER PMC1SELECT 25 26 PMC2SELECT 31
Figure 2-5. Monitor Mode Control Register 0 (MMCR0)
This register must be cleared at power up. Reading this register does not change its contents. The bits of the MMCR0 register are described in Table 2-7.
Table 2-7. MMCR0 Bit Settings
Bits 0 Name DIS Description Disables counting unconditionally 0 The values of the PMCn counters can be changed by hardware. 1 The values of the PMCn counters cannot be changed by hardware. Disables counting while in supervisor mode 0 The PMCn counters can be changed by hardware. 1 If the processor is in supervisor mode (MSR[PR] is cleared), the counters are not changed by hardware. Disables counting while in user mode 0 The PMCn counters can be changed by hardware. 1 If the processor is in user mode (MSR[PR] is set), the PMCn counters are not changed by hardware. Disables counting while MSR[PM] is set 0 The PMCn counters can be changed by hardware. 1 If MSR[PM] is set, the PMCn counters are not changed by hardware. Disables counting while MSR(PM) is zero. 0 The PMCn counters can be changed by hardware. 1 If MSR[PM] is cleared, the PMCn counters are not changed by hardware. Enables performance monitor interrupt signaling. 0 Interrupt signaling is disabled. 1 Interrupt signaling is enabled. Cleared by hardware when a performance monitor interrupt is signaled. To reenable these interrupt signals, software must set this bit after handling the performance monitor interrupt. The IPL ROM code clears this bit before passing control to the operating system.
1
DP
2
DU
3
DMS
4
DMR
5
ENINT
MOTOROLA
Chapter 2. Programming Model
2-15
The MPC750 Processor Register Set
Table 2-7. MMCR0 Bit Settings (continued)
Bits 6 Name DISCOUNT Description Disables counting of PMCn when a performance monitor interrupt is signaled (that is, ((PMCnINTCONTROL = 1) & (PMCn[0] = 1) & (ENINT = 1)) or the occurrence of an enabled time base transition with ((INTONBITTRANS =1) & (ENINT = 1)). 0 Signaling a performance monitor interrupt does not affect counting status of PMCn. 1 The signaling of a performance monitor interrupt prevents changing of PMC1 counter. The PMCn counter do not change if PMC2COUNTCTL = 0. Because a time base signal could have occurred along with an enabled counter overflow condition, software should always reset INTONBITTRANS to zero, if the value in INTONBITTRANS was a one. 64-bit time base, bit selection enable 00 Pick bit 63 to count 01 Pick bit 55 to count 10 Pick bit 51 to count 11 Pick bit 47 to count Cause interrupt signaling on bit transition (identified in RTCSELECT) from off to on 0 Do not allow interrupt signal if chosen bit transitions. 1 Signal interrupt if chosen bit transitions. Software is responsible for setting and clearing INTONBITTRANS. Threshold value. The MPC750 supports all 6 bits, allowing threshold values from 0-63. The intent of the THRESHOLD support is to characterize L1 data cache misses.
7-8
RTCSELECT
9
INTONBITTRANS
10-15 16
THRESHOLD
PMC1INTCONTROL Enables interrupt signaling due to PMC1 counter overflow. 0 Disable PMC1 interrupt signaling due to PMC1 counter overflow 1 Enable PMC1 Interrupt signaling due to PMC1 counter overflow PMCINTCONTROL Enable interrupt signaling due to any PMC2-PMC4 counter overflow. Overrides the setting of DISCOUNT. 0 Disable PMC2-PMC4 interrupt signaling due to PMC2-PMC4 counter overflow. 1 Enable PMC2-PMC4 interrupt signaling due to PMC2-PMC4 counter overflow. PMCTRIGGER Can be used to trigger counting of PMC2-PMC4 after PMC1 has overflowed or after a performance monitor interrupt is signaled. 0 Enable PMC2-PMC4 counting. 1 Disable PMC2-PMC4 counting until either PMC1[0] = 1 or a performance monitor interrupt is signaled. PMC1 input selector, 128 events selectable. See Table 2-10. PMC2 input selector, 64 events selectable. See Table 2-11.
17
18
19-25 26-31
PMC1SELECT PMC2SELECT
MMCR0 can be accessed with mtspr and mfspr using SPR 952. 2.1.2.4.2 User Monitor Mode Control Register 0 (UMMCR0)
The contents of MMCR0 are reflected to UMMCR0, which can be read by user-level software. MMCR0 can be accessed with mfspr using SPR 936.
2-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
2.1.2.4.3
Monitor Mode Control Register 1 (MMCR1)
The monitor mode control register 1 (MMCR1) functions as an event selector for performance monitor counter registers 3 and 4 (PMC3 and PMC4). The MMCR1 register is shown in Figure 2-6.
Reserved PMC3SELECT
0 4 5
PMC4SELECT
9
0 10
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 31
Figure 2-6. Monitor Mode Control Register 1 (MMCR1)
Bit settings for MMCR1 are shown in Table 2-8. The corresponding events are described in Section 2.1.2.4.5, "Performance Monitor Counter Registers (PMC1-PMC4)."
Table 2-8. MMCR1 Bit Settings
Bits 0-4 5-9 10-31 Name PMC3SELECT PMC4SELECT -- Description PMC3 input selector. 32 events selectable. See Table 2-12 for defined selections. PMC4 input selector. 32 events selectable. See Table 2-13 for defined selections. Reserved
MMCR1 can be accessed with mtspr and mfspr using SPR 956. User-level software can read the contents of MMCR1 by issuing an mfspr instruction to UMMCR1, described in Section 2.1.2.4.4, "User Monitor Mode Control Register 1 (UMMCR1)." 2.1.2.4.4 User Monitor Mode Control Register 1 (UMMCR1)
The contents of MMCR1 are reflected to UMMCR1, which can be read by user-level software. MMCR1 can be accessed with mfspr using SPR 940. 2.1.2.4.5 Performance Monitor Counter Registers (PMC1-PMC4)
PMC1-PMC4, shown in Figure 2-7, are 32-bit counters that can be programmed to generate interrupt signals when they overflow.
OV 0 1
Counter Value
31
Figure 2-7. Performance Monitor Counter Registers (PMC1-PMC4)
The bits contained in the PMCn registers are described in Table 2-9.
MOTOROLA
Chapter 2. Programming Model
2-17
The MPC750 Processor Register Set
Table 2-9. PMCn Bit Settings
Bits 0 1-31 Name OV Counter value Description Overflow. When this bit is set it indicates that this counter has reached its maximum value. Indicates the number of occurrences of the specified event.
Counters are considered to overflow when the high-order bit (the sign bit) becomes set; that is, they reach the value 2147483648 (0x8000_0000). However, an interrupt is not signaled unless both PMCn[INTCONTROL] and MMCR0[ENINT] are also set. Note that the interrupts can be masked by clearing MSR[EE]; the interrupt signal condition may occur with MSR[EE] cleared, but the exception is not taken until EE is set. Setting MMCR0[DISCOUNT] forces counters to stop counting when a counter interrupt occurs. Software is expected to use mtspr to set PMC explicitly to nonoverflow values. If software sets an overflow value, an erroneous exception may occur. For example, if both PMCn[INTCONTROL] and MMCR0[ENINT] are set and mtspr loads an overflow value, an interrupt signal may be generated without any event counting having taken place. The event to be monitored can be chosen by setting MMCR0[0-9]. The selected events are counted beginning when MMCR0 is set until either MMCR0 is reset or a performance monitor interrupt is generated. Table 2-10 lists the selectable events and their encodings.
Table 2-10. PMC1 Events--MMCR0[19-25] Select Encodings
Encoding 000 0000 000 0001 000 0010 0000011 0000100 0000101 0000110 0000111 0001000 0001001 0001010 0001011 0001100 All others Register holds current value. Number of processor cycles Number of completed instructions. Does not include folded branches. Number of transitions from 0 to 1 of specified bits in time base lower register. Bits are specified through RTCSELECT (MMRC0[7-8]). 00 = 15, 01 = 19, 10 = 23, 11 = 31 Number of instructions dispatched--0, 1, or 2 instructions per cycle Number of eieio instructions completed Number of cycles spent performing table search operations for the ITLB Number of accesses that hit the L2 Number of valid instruction EAs delivered to the memory subsystem Number of times the address of an instruction being completed matches the address in the IABR Number of loads that miss the L1 with latencies that exceeded the threshold value Number of branches that are unresolved when processed Number of cycles the dispatcher stalls due to a second unresolved branch in the instruction stream Reserved. May be used in a later revision. Description
Bits MMCR0[26-31] specify events associated with PMC2, as shown in Table 2-11.
2-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
Table 2-11. PMC2 Events--MMCR0[26-31] Select Encodings
Encoding 00 0000 00 0001 00 0010 00 0011 00 0100 00 0101 00 0110 00 0111 00 1000 00 1001 00 1010 00 1011 00 1100 00 1101 00 1110 00 1111 01 0000 All others Register holds current value. Counts processor cycles. Counts completed instructions. Does not include folded branches. Counts transitions from 0 to 1 of TBL bits specified through MMRC0[RTCSELECT]. 00 = 47, 01 = 51, 10 = 55, 11 = 63. Counts instructions dispatched. 0, 1, or 2 instructions per cycle. Counts L1 instruction cache misses. Counts ITLB misses. Counts L2 instruction misses. Counts branches predicted or resolved not taken. Counts MSR[PR] bit toggles. Counts times reserved load operations completed. Counts completed load and store instructions. Counts snoops to the L1 and the L2. Counts L1 cast-outs to the L2. Counts completed system unit instructions. Counts instruction fetch misses in the L1. Counts branches allowing out-of-order execution that resolved correctly. Reserved. Description
Bits MMCR1[0-4] specify events associated with PMC3, as shown in Table 2-12.
Table 2-12. PMC3 Events--MMCR1[0-4] Select Encodings
Encoding 0 0000 0 0001 0 0010 0 0011 0 0100 0 0101 0 0110 0 0111 0 1000 0 1001 Register holds current value. Number of processor cycles Number of completed instructions, not including folded branches. Number of transitions from 0 to 1 of specified bits in the time base lower register. Bits are specified through RTCSELECT (MMRC0[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 = 63. Number of instructions dispatched. 0, 1, or 2 per cycle. Number of L1 data cache misses Number of DTLB misses Number of L2 data misses Number of taken branches, including predicted branches. Number of transitions between marked and unmarked processes while in user mode. That is, the number of MSR[PM] toggles while the processor is in user mode. Description
MOTOROLA
Chapter 2. Programming Model
2-19
The MPC750 Processor Register Set
Table 2-12. PMC3 Events--MMCR1[0-4] Select Encodings (continued)
Encoding 0 1010 0 1011 0 1100 0 1101 0 1110 0 1111 1 0000 1 0001 All others Description Number of store conditional instructions completed Number of instructions completed from the FPU Number of L2 castouts caused by snoops to modified lines Number of cache operations that hit in the L2 cache Reserved Number of cycles generated by L1 load misses Number of branches in the second speculative stream that resolve correctly Number of cycles the BPU stalls due to LR or CR unresolved dependencies Reserved. May be used in a later revision.
Bits MMCR1[5-9] specify events associated with PMC4, as shown in Table 2-13.
Table 2-13. PMC4 Events--MMCR1[5-9] Select Encodings
Encoding 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 All others Register holds current value Number of processor cycles Number of completed instructions, not including folded branches Number of transitions from 0 to 1 of specified bits in the time base lower register. Bits are specified through RTCSELECT (MMRC0[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 = 63. Number of instructions dispatched. 0, 1, or 2 per cycle. Number of L2 castouts Number of cycles spent performing tables searches for DTLB accesses Reserved. May be used in a later revision. Number of mispredicted branches Number of transitions between marked and unmarked processes while in user mode. That is, the number of MSR[PM] toggles while the processor is in supervisor mode. Number of store conditional instructions completed with reservation intact Number of completed sync instructions Number of snoop request retries Number of completed integer operations Number of cycles the BPU cannot process new branches due to having two unresolved branches Reserved. May be used in a later revision. Comments
The PMC registers can be accessed with mtspr and mfspr using following SPR numbers: * *
2-20
PMC1 is SPR 953 PMC2 is SPR 954
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
The MPC750 Processor Register Set
* *
PMC3 is SPR 957 PMC4 is SPR 958 User Performance Monitor Counter Registers (UPMC1-UPMC4)
2.1.2.4.6
The contents of the PMC1-PMC4 are reflected to UPMC1-UPMC4, which can be read by user-level software. The UPMC registers can be read with mfspr using the following SPR numbers: * * * * UPMC1 is SPR 937 UPMC2 is SPR 938 UPMC3 is SPR 941 UPMC4 is SPR 942 Sampled Instruction Address Register (SIA)
2.1.2.4.7
The sampled instruction address register (SIA) is a supervisor-level register that contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition. The SIA is shown in Figure 2-8.
Instruction Address
0 31
Figure 2-8. Sampled instruction Address Registers (SIA)
If the performance monitor interrupt is triggered by a threshold event, the SIA contains the exact instruction (called the sampled instruction) that caused the counter to overflow. If the performance monitor interrupt was caused by something besides a threshold event, the SIA contains the address of the last instruction completed during that cycle. SIA can be accessed with the mtspr and mfspr instructions using SPR 955. 2.1.2.4.8 User Sampled Instruction Address Register (USIA)
The contents of SIA are reflected to USIA, which can be read by user-level software. USIA can be accessed with the mfspr instructions using SPR 939. 2.1.2.4.9 Sampled Data Address Register (SDA) and User Sampled Data Address Register (USDA)
The MPC750 does not implement the sampled data address register (SDA) or the user-level, read-only USDA registers. However, for compatibility with processors that do, those registers can be written to by boot code without causing an exception. SDA is SPR 959; USDA is SPR 943.
MOTOROLA
Chapter 2. Programming Model
2-21
The MPC750 Processor Register Set
2.1.3
Instruction Cache Throttling Control Register (ICTC)
Reducing the rate of instruction fetching can control junction temperature without the complexity and overhead of dynamic clock control. System software can control instruction forwarding by writing a nonzero value to the ICTC register, a supervisor-level register shown in Figure 2-9. The overall junction temperature reduction comes from the dynamic power management of each functional unit when the MPC750 is idle in between instruction fetches. PLL (phase-locked loop) and DLL (delay-locked loop) configurations are unchanged.
Reserved
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 23
FI
E
30 31
Figure 2-9. Instruction Cache Throttling Control Register (ICTC)
Table 2-14 describes the bit fields for the ICTC register.
Table 2-14. ICTC Bit Settings
Bits 0-22 23-30 Name -- FI Reserved Instruction forwarding interval expressed in processor clocks. 0x00 0 clock cycle 0x01 1 clock cycle . . 0xFF 255 clock cycles Cache throttling enable 0 Disable instruction cache throttling. 1 Enable instruction cache throttling. Description
31
E
Instruction cache throttling is enabled by setting ICTC[E] and writing the instruction forwarding interval into ICTC[FI]. Enabling, disabling, and changing the instruction forwarding interval affect instruction forwarding immediately. The ICTC register can be accessed with the mtspr and mfspr instructions using SPR 1019.
2.1.4
* * *
Thermal Management Registers (THRM1-THRM3)
Compares the junction temperature against user programmed thresholds Generates a thermal management interrupt if the temperature crosses the threshold Provides a way for a successive approximation routine to estimate junction temperature
The on-chip thermal management assist unit provides the following functions:
2-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
Control and access to the thermal management assist unit is through the privileged mtspr/mfspr instructions to the three THRM registers. THRM1 and THRM2, shown in Figure 2-10, provide the ability to compare the junction temperature against two user-provided thresholds. Having dual thresholds allows thermal management software differing degrees of action in reducing junction temperature. Thermal management can use a single-threshold mode in which the thermal sensor output is compared to only one threshold in either THRM1 or THRM2.
Reserved TIN TIV
0 1 2
THRESHOLD
8
0 9
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 TID TIE V 28 29 30 31
Figure 2-10. Thermal Management Registers 1-2 (THRM1-THRM2)
The bits in THRM1 and THRM2 are described in Table 2-15.
Table 2-15. THRM1-THRM2 Bit Settings
Bits 0 Field TIN Description Thermal management interrupt bit. Read-only. This bit is set if the thermal sensor output crosses the threshold specified in the SPR. The state of TIN is valid only if TIV is set. The interpretation of TIN is controlled by TID. See Table 2-16. Thermal management interrupt valid. Read-only. This bit is set by the thermal assist logic to indicate that the thermal management interrupt (TIN) state is valid. See Table 2-16.
1 2-8 9-28 29
TIV
Threshold Threshold that the thermal sensor output is compared to. The range is 0--127 C, and each bit represents 1 C. Note that this is not the resolution of the thermal sensor. -- TID Reserved. System software should clear these bits when writing to the THRMn SPRs. Thermal management interrupt direction bit. Selects the result of the temperature comparison to set TIN and to assert a thermal management interrupt if TIE is set. If TID is cleared, TIN is set and an interrupt occurs if the junction temperature exceeds the threshold. If TID is set, TIN is set and an interrupt is indicated if the junction temperature is below the threshold. See Table 2-16. Thermal management interrupt enable. The thermal management interrupt is maskable by the MSR[EE] bit. If TIE is cleared and THRMn is valid, the TIN bit records the status of the junction temperature vs. threshold comparison without causing an exception. This lets system software successively approximate the junction temperature. See Table 2-16. SPR valid bit. Setting this bit indicates the SPR contains a valid threshold, TID and TIE controls bits. THRM1/2[V] = 1 and THRM3[E] = 1 enables the thermal sensor operation. See Table 2-16.
30
TIE
31
V
If an mtspr affects a THRM register that contains operating parameters for an ongoing comparison during operation of the thermal assist unit, the respective TIV bits are cleared and the comparison is restarted. Changing THRM3 forces the TIV bits of both THRM1 and THRM2 to 0, and restarts the comparison if THRM3[E] is set. Examples of valid THRM1/THRM2 bit settings are shown in Table 2-16.
MOTOROLA
Chapter 2. Programming Model
2-23
The MPC750 Processor Register Set
Table 2-16. Valid THRM1/THRM2 States
TIN 1 x x x x x 0 1 0 1 Note:
1
TIV1 x x x x 0 1 1 1 1
TID x x 0 1 x 0 0 1 1
TIE x 0 x x x x x x x
V 0 1 1 1 1 1 1 1 1
Description Invalid entry. The threshold in the SPR is not used for comparison. Disable thermal management interrupt assertion. Set TIN and assert thermal management interrupt if TIE = 1 and the junction temperature exceeds the threshold. Set TIN and assert thermal management interrupt if TIE = 1 and the junction temperature is less than the threshold. The state of the TIN bit is not valid. The junction temperature is less than the threshold and as a result the thermal management interrupt is not generated for TIE = 1. The junction temperature is greater than the threshold and as a result the thermal management interrupt is generated if TIE = 1. The junction temperature is greater than the threshold and as a result the thermal management interrupt is not generated for TIE = 1. The junction temperature is less than the threshold and as a result the thermal management interrupt is generated if TIE = 1.
TIN and TIV are read-only status bits.
The THRM3 register, shown in Figure 2-11, is used to enable the thermal assist unit and to control the comparator output sample time. The thermal assist logic manages the thermal management interrupt generation and time-multiplexed comparisons in dual-threshold mode as well as other control functions.
Reserved
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 18
Sampled Interval Timer Value
E
30 31
Figure 2-11. Thermal Management Register 3 (THRM3)
The bits in THRM3 are described in Table 2-17.
Table 2-17. THRM3 Bit Settings
Bits 0-17 18-30 Name -- SITV Description Reserved for future use. System software should clear these bits when writing to the THRM3. Sample interval timer value. Number of elapsed processor clock cycles before a junction temperature vs. threshold comparison result is sampled for TIN bit setting and interrupt generation. This is necessary due to the thermal sensor, DAC, and the analog comparator settling time being greater than the processor cycle time. The value should be configured to allow a sampling interval of 20 microseconds. Enables the thermal sensor compare operation if either THRM1[V] or THRM2[V] is set.
31
E
2-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
The THRM registers can be accessed with the mtspr and mfspr instructions using the following SPR numbers: * * * THRM1 is SPR 1020 THRM2 is SPR 1021 THRM3 is SPR 1022
2.1.5
L2 Cache Control Register (L2CR)
The L2 cache control register, shown in Figure 2-12, is a supervisor-level, implementation-specific SPR used to configure and operate the L2 cache. It is cleared by a hard reset or power-on reset.
L2PE L2E 0 1 L2SIZ 2 3 4 L2CLK 6 L2RAM 7 8 L2WT L2DR L2CTL L2TS L2I L2OH L2DF L2SL L2BYP 0 0 0 0 0 0 0 0 0 0 0 Reserved L2IP 0 30 31
9 10 11 12 13 14 15 16 17 18 19
Figure 2-12. L2 Cache Control Register (L2CR)
The L2 cache interface is described in Chapter 9, "L2 Cache Interface Operation." The L2CR bits are described in Table 2-18.
Table 2-18. L2CR Bit Settings
Bits 0 Name L2E Function L2 enable. Enables L2 cache operation (including snooping) starting with the next transaction the L2 cache unit receives. Before enabling the L2 cache, the L2 clock must be configured through L2CR[2CLK], and the L2 DLL must stabilize (see the hardware specifications). All other L2CR bits must be set appropriately. The L2 cache may need to be invalidated globally. L2 data parity generation and checking enable. Enables parity generation and checking for the L2 data RAM interface. When disabled, generated parity is always zeros. L2 size--Should be set according to the size of the L2 data RAMs used. A 256-Kbyte L2 cache requires a data RAM configuration of 32 Kbytes x 64 bits; a 512-Kbyte L2 cache requires a configuration of 64 Kbyte x 64 bits; a 1-Mbyte L2 cache requires a configuration of 128K x 64 bits. 00 Reserved 01 256 Kbyte 10 512 Kbyte 11 1 Mbyte
1 2-3
L2PE L2SIZ
MOTOROLA
Chapter 2. Programming Model
2-25
The MPC750 Processor Register Set
Table 2-18. L2CR Bit Settings (continued)
Bits 4-6 Name Function
L2CLK L2 clock ratio (core-to-L2 frequency divider). Specifies the clock divider ratio based from the core clock frequency that the L2 data RAM interface is to operate at. When these bits are cleared, the L2 clock is stopped and the on-chip DLL for the L2 interface is disabled. For nonzero values, the processor generates the L2 clock and the on-chip DLL is enabled. After the L2 clock ratio is chosen, the DLL must stabilize before the L2 interface can be enabled. (See the hardware specifications). The resulting L2 clock frequency cannot be slower than the clock frequency of the 60x bus interface. 000 L2 clock and DLL disabled 001 /1 010 /1.5 011 Reserved 100 /2 101 /2.5 110 /3 111 Reserved L2RAM L2 RAM type--Configures the L2 RAM interface for the type of synchronous SRAMs used: * Flow-through (register-buffer) synchronous burst SRAMs that clock addresses in and flow data out * Pipelined (register-register) synchronous burst SRAMs that clock addresses in and clock data out * Late-write synchronous SRAMs, for which the MPC750 requires a pipelined (register-register) configuration. Late-write RAMs require write data to be valid on the cycle after WE is asserted, rather than on the same cycle as the write enable as with traditional burst RAMs. For burst RAM selections, the MPC750 does not burst data into the L2 cache; it generates an address for each access. Pipelined SRAMs may be used for all L2 clock modes. Note that flow-through SRAMs can be used only for L2 clock modes divide-by-2 or slower (divide-by-1 and divide-by-1.5 not allowed). 00 Flow-through (register-buffer) synchronous burst SRAM 01 Reserved 10 Pipelined (register-register) synchronous burst SRAM 11 Pipelined (register-register) synchronous late-write SRAM L2DO L2 data-only. Setting this bit enables data-only operation in the L2 cache. For this operation, only transactions from the L1 data cache can be cached in the L2 cache, which treats all transactions from the L1 instruction cache as cache-inhibited (bypass L2 cache, no L2 checking done). This bit is provided for L2 testing only. L2 global invalidate. Setting L2I invalidates the L2 cache globally by clearing the L2 bits including status bits. This bit must not be set while the L2 cache is enabled.
7-8
9
10 11
L2I
L2CTL L2 RAM control (ZZ enable). Setting L2CTL enables the automatic operation of the L2ZZ (low-power mode) signal for cache RAMs that support the ZZ function. While L2CTL is asserted, L2ZZ asserts automatically when the MPC750 enters nap or sleep mode and negates automatically when the MPC750 exits nap or sleep mode. This bit should not be set when the MPC750 is in nap mode and snooping is to be performed through deassertion of QACK. L2WT L2 write-through. Setting L2WT selects write-through mode (rather than the default write-back mode) so all writes to the L2 cache also write through to the 60x bus. For these writes, the L2 cache entry is always marked as clean (valid unmodified) rather than dirty (valid modified). This bit must never be asserted after the L2 cache has been enabled as previously-modified lines can get remarked as clean during normal operation. L2 test support. Setting L2TS causes cache block pushes from the L1 data cache that result from dcbf and dcbst instructions to be written only into the L2 cache and marked valid, rather than being written only to the 60x bus and marked invalid in the L2 cache in case of hit. This bit allows a dcbz/dcbf instruction sequence to be used with the L1 cache enabled to easily initialize the L2 cache with any address and data information. This bit also keeps dcbz instructions from being broadcast on the 60x and single-beat cacheable store misses in the L2 from being written to the 60x bus.
12
13
L2TS
2-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC750 Processor Register Set
Table 2-18. L2CR Bit Settings (continued)
Bits 14-15 Name L2OH Function L2 output hold. These bits configure output hold time for address, data, and control signals driven by the MPC750 to the L2 data RAMs. They should generally be set according to the SRAM's input hold time requirements, for which late-write SRAMs usually differ from flow-through or burst SRAMs. 00 0.5 nS 01 1.0 nS 1x Reserved L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM interface is operated below 150 MHz. L2 differential clock. Setting L2DF configures the two clock-out signals (L2CLK_OUTA and L2CLK_OUTB) of the L2 interface to operate as one differential clock. In this mode, the B clock is driven as the logical complement of the A clock. This mode supports the differential clock requirements of late-write SRAMs. Generally, this bit should be set when late-write SRAMs are used.
16
L2SL
17
L2DF
18
L2BYP L2 DLL bypass. The DLL unit receives three input clocks: * A square-wave clock from the PLL unit to phase adjust and export * A non-square-wave clock for the internal phase reference * A feedback clock (L2SYNC_IN) for the external phase reference. Asserting L2BYP causes clock #2 to be used as clocks #1 and #2. (Clock #2 is the actual clock used by the registers of the L2 interface circuitry.) L2BYP is intended for use when the PLL is being bypassed, and for engineering evaluation. If the PLL is being bypassed, the DLL must be operated in divide-by-1 mode, and SYSCLK must be fast enough for the DLL to support. -- L2IP Reserved. These bits are implemented but not used; keep at 0 for future compatibility. L2 global invalidate in progress (read only). This read-only bit indicates whether an L2 global invalidate is occurring. It should be monitored after an L2 global invalidate has been initiated by the L2I bit to determine when it has completed.
19-30 31
The L2CR register can be accessed with the mtspr and mfspr instructions using SPR 1017.
2.1.6
Reset Settings
Table 2-19 shows the state of the registers and other resources after a hard reset and before the first instruction is fetched from address 0xFFF0_0100 (the system reset exception vector).
Table 2-19. Settings Caused by Hard Reset (Used at Power-On)
Resource BATs Undefined Setting MSR PMCn PVR Resource Setting 0x0000_0040 (only IP set) Undefined ROM value
Caches (L1 /L2)* Invalidated and disabled CR CTR DABR DAR DEC Undefined Undefined
Reservation address Undefined Cleared 0x0000_0000 0x0000_0000
Breakpoint is disabled. Address is undefined. Reservation flag 0x0000_0000 0xFFFF_FFFF SDR1 SIA
MOTOROLA
Chapter 2. Programming Model
2-27
Operand Conventions
Table 2-19. Settings Caused by Hard Reset (Used at Power-On) (continued)
Resource DSISR EAR FPR FPSCR GPR HID0 HID1 IABR ICTC L2CR LR MMCRn 0x0000_0000 0x0000_0000 Undefined 0x0000_0000 Undefined 0x0000_0000 0x0000_0000 0x0000_0000 (Breakpoint is disabled.) 0x0000_0000 0x0000_0000 0x0000_0000 0x0000_0000 Setting Resource SPRG0-SPGR3 SRs SRR0 SRR1 TBU and TBL THRM1-THRM3 TLB UMMCRn UPMCn USIA XER Setting 0x0000_0000 Undefined 0x0000_0000 0x0000_0000 0x0000_0000 0x0000_0000 Undefined 0x0000_0000 0x0000_0000 0x0000_0000 0x0000_0000
* The processor automatically begins operations by issuing an instruction fetch. Because caching is inhibited at start-up, this generates a single-beat load operation on the bus.
2.2
Operand Conventions
This section describes the operand conventions as they are represented in two levels of the PowerPC architecture--UISA and VEA. Detailed descriptions are provided of conventions used for storing values in registers and memory, accessing PowerPC registers, and representation of data in these registers.
2.2.1
Floating-Point Execution Models--UISA
The IEEE 754 standard defines conventions for 64- and 32-bit arithmetic. The standard requires that single-precision arithmetic be provided for single-precision operands. The standard permits double-precision arithmetic instructions to have either (or both) single-precision or double-precision operands, but states that single-precision arithmetic instructions should not accept double-precision operands. The PowerPC UISA follows these guidelines: * * Double-precision arithmetic instructions may have single-precision operands but always produce double-precision results. Single-precision arithmetic instructions require all operands to be single-precision and always produce single-precision results.
For arithmetic instructions, conversion from double- to single-precision must be done explicitly by software, while conversion from single- to double-precision is done implicitly by the processor.
2-28 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Operand Conventions
All implementations provide the equivalent of the following execution models to ensure that identical results are obtained. The definition of the arithmetic instructions for infinities, denormalized numbers, and NaNs follow conventions described in the following sections. Although the double-precision format specifies an 11-bit exponent, exponent arithmetic uses two additional bit positions to avoid potential transient overflow conditions. An extra bit is required when denormalized double-precision numbers are prenormalized. A second bit is required to permit computation of the adjusted exponent value in the following examples when the corresponding exception enable bit is one: * * Underflow during multiplication using a denormalized operand Overflow during division using a denormalized divisor
2.2.2
Data Organization in Memory and Data Transfers
Bytes in memory are numbered consecutively starting with 0. Each number is the address of the corresponding byte. Memory operands may be bytes, half words, words, or double words, or, for the load/store multiple and load/store string instructions, a sequence of bytes or words. The address of a memory operand is the address of its first byte (that is, of its lowest-numbered byte). Operand length is implicit for each instruction.
2.2.3
Alignment and Misaligned Accesses
The operand of a single-register memory access instruction has an alignment boundary equal to its length. An operand's address is misaligned if is not a multiple of its width. Operands for single-register memory access instructions have the characteristics shown in Table 2-20. Although not permitted as memory operands, quad words are shown because quad-word alignment is desirable for certain memory operands. The concept of alignment is also applied more generally to data in memory. For example, a 12-byte data item is said to be word-aligned if its address is a multiple of four. Some instructions require their memory operands to have certain alignment. In addition, alignment may affect performance. For single-register memory access instructions, the best performance is obtained when memory operands are aligned. Instructions are 32 bits (one word) long and must be word-aligned. The MPC750 does not provide hardware support for floating-point memory that is not word-aligned. If a floating-point operand is not aligned, the MPC750 invokes an alignment exception, and it is left up to software to break up the offending storage access operation appropriately. In addition, some non-double-word-aligned memory accesses suffer performance degradation as compared to an aligned access of the same type.
MOTOROLA
Chapter 2. Programming Model
2-29
Operand Conventions
In general, floating-point word accesses should always be word-aligned and floating-point double-word accesses should always be double-word-aligned. Frequent use of misaligned accesses is discouraged since they can degrade overall performance.
2.2.4
Floating-Point Operand
The MPC750 provides hardware support for all single- and double-precision floating-point operations for most value representations and all rounding modes. This architecture provides for hardware to implement a floating-point system as defined in ANSI/IEEE standard 754-1985, IEEE Standard for Binary Floating Point Arithmetic. Detailed information about the floating-point execution model can be found in Chapter 3, "Operand Conventions," in the Programming Environments Manual. The MPC750 supports non-IEEE mode whenever FPSCR[29] is set. In this mode, denormalized numbers, NaNs, and some IEEE invalid operations are treated in a non-IEEE conforming manner. This is accomplished by delivering results that approximate the values required by the IEEE standard. Table 2-20 summarizes the conditions and mode behavior for operands.
Table 2-20. Floating-Point Operand Data Type Behavior
Operand A Data Type Single denormalized Double denormalized Single denormalized Double denormalized Normalized or zero Single denormalized Double denormalized Single denormalized Double denormalized Normalized or zero Normalized or zero Single QNaN Single SNaN Double QNaN Double SNaN Don't care Operand B Data Type Single denormalized Double denormalized Single denormalized Double denormalized Single denormalized Double denormalized Normalized or zero Normalized or zero Single denormalized Double denormalized Normalized or zero Don't care Operand C Data Type Single denormalized Double denormalized Normalized or zero Single denormalized Double denormalized Single denormalized Double denormalized Normalized or zero Normalized or zero Single denormalized Double denormalized Don't care IEEE Mode (NI = 0) Normalize all three Normalize A and B Normalize B and C Normalize A and C Normalize A Normalize B Normalize C QNaN1 Non-IEEE Mode (NI = 1) Zero all three Zero A and B Zero B and C Zero A and C Zero A Zero B Zero C QNaN1
Single QNaN Single SNaN Double QNaN Double SNaN
Don't care
QNaN1
QNaN1
2-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-20. Floating-Point Operand Data Type Behavior (continued)
Operand A Data Type Don't care Operand B Data Type Don't care Operand C Data Type Single QNaN Single SNaN Double QNaN Double SNaN Single normalized Single infinity Single zero Double normalized Double infinity Double zero IEEE Mode (NI = 0) QNaN1 Non-IEEE Mode (NI = 1) QNaN1
Single normalized Single infinity Single zero Double normalized Double infinity Double zero
1
Single normalized Single infinity Single zero Double normalized Double infinity Double zero
Do the operation
Do the operation
Prioritize according to Chapter 3, "Operand Conventions," in the Programming Environments Manual.
Table 2-21 summarizes the mode behavior for results.
Table 2-21. Floating-Point Result Data Type Behavior
Precision Single Single Single Single Data Type Denormalized Normalized, infinity, zero QNaN, SNaN INT IEEE Mode (NI = 0) Return single-precision denormalized number with trailing zeros. Return the result. Return QNaN. Place integer into low word of FPR. Non-IEEE Mode (NI = 1) Return zero. Return the result. Return QNaN. If (Invalid Operation) then Place (0x8000) into FPR[32-63] else Place integer into FPR[32-63]. Return zero. Return the result. Return QNaN. Not supported by MPC750
Double Double Double Double
Denormalized Normalized, infinity, zero QNaN, SNaN INT
Return double-precision denormalized number. Return the result. Return QNaN. Not supported by MPC750
2.3
Instruction Set Summary
This chapter describes instructions and addressing modes defined for the MPC750. These instructions are divided into the following functional categories: * * Integer instructions--These include arithmetic and logical instructions. For more information, see Section 2.3.4.1, "Integer Instructions." Floating-point instructions--These include floating-point arithmetic instructions, as well as instructions that affect the floating-point status and control register (FPSCR). For more information, see Section 2.3.4.2, "Floating-Point Instructions."
Chapter 2. Programming Model 2-31
MOTOROLA
Instruction Set Summary
*
*
*
*
*
*
Load and store instructions--These include integer and floating-point load and store instructions. For more information, see Section 2.3.4.3, "Load and Store Instructions." Flow control instructions--These include branching instructions, condition register logical instructions, trap instructions, and other instructions that affect the instruction flow. For more information, see Section 2.3.4.4, "Branch and Flow Control Instructions." Processor control instructions--These instructions are used for synchronizing memory accesses and managing caches, TLBs, and segment registers. For more information, see Section 2.3.4.6, "Processor Control Instructions--UISA," Section 2.3.5.1, "Processor Control Instructions--VEA," and Section 2.3.6.2, "Processor Control Instructions--OEA." Memory synchronization instructions--These instructions are used for memory synchronizing. See Section 2.3.4.7, "Memory Synchronization Instructions--UISA," Section 2.3.5.2, "Memory Synchronization Instructions--VEA," for more information. Memory control instructions--These instructions provide control of caches, TLBs, and segment registers. For more information, see Section 2.3.5.3, "Memory Control Instructions--VEA," and Section 2.3.6.3, "Memory Control Instructions--OEA." External control instructions--These include instructions for use with special input/output devices. For more information, see Section 2.3.5.4, "Optional External Control Instructions."
Note that this grouping of instructions does not necessarily indicate the execution unit that processes a particular instruction or group of instructions. This information, which is useful for scheduling instructions most effectively, is provided in Chapter 6, "Instruction Timing." Integer instructions operate on word operands. Floating-point instructions operate on single-precision and double-precision floating-point operands. The PowerPC architecture uses instructions that are four bytes long and word-aligned. It provides for byte, half-word, and word operand loads and stores between memory and a set of 32 general-purpose registers (GPRs). It also provides for word and double-word operand loads and stores between memory and a set of 32 floating-point registers (FPRs). Arithmetic and logical instructions do not read or modify memory. To use the contents of a memory location in a computation and then modify the same or another memory location, the memory contents must be loaded into a register, modified, and then written to the target location using load and store instructions. The description of each instruction includes the mnemonic and a formatted list of operands. To simplify assembly language programming, a set of simplified mnemonics and symbols is provided for some of the frequently-used instructions; see Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for a complete list of simplified mnemonics. Note that the architecture specification refers to simplified mnemonics as
2-32 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Set Summary
extended mnemonics. Programs written to be portable across the various assemblers for the PowerPC architecture should not assume the existence of mnemonics not described in that document.
2.3.1
* * *
Classes of Instructions
Defined Illegal Reserved
The MPC750 instructions belong to one of the following three classes:
Note that while the definitions of these terms are consistent among the processors of this family, the assignment of these classifications is not. For example, PowerPC instructions defined for 64-bit implementations are treated as illegal by 32-bit implementations such as the MPC750. The class is determined by examining the primary opcode and the extended opcode, if any. If the opcode, or combination of opcode and extended opcode, is not that of a defined instruction or of a reserved instruction, the instruction is illegal. Instruction encodings that are now illegal may become assigned to instructions in the architecture or may be reserved by being assigned to processor-specific instructions.
2.3.1.1
Definition of Boundedly Undefined
If instructions are encoded with incorrectly set bits in reserved fields, the results on execution can be said to be boundedly undefined. If a user-level program executes the incorrectly coded instruction, the resulting undefined results are bounded in that a spurious change from user to supervisor state is not allowed, and the level of privilege exercised by the program in relation to memory access and other system resources cannot be exceeded. Boundedly-undefined results for a given instruction may vary between implementations, and between execution attempts in the same implementation.
2.3.1.2
Defined Instruction Class
Defined instructions are guaranteed to be supported in all implementations, except as stated in the instruction descriptions in Chapter 8, "Instruction Set," in the Programming Environments Manual. The MPC750 provides hardware support for all instructions defined for 32-bit implementations. It does not support the optional fsqrt, fsqrts, and tlbia instructions. A processor invokes the illegal instruction error handler (part of the program exception) when the unimplemented PowerPC instructions are encountered so they may be emulated in software, as required. Note that the architecture specification refers to exceptions as interrupts.
MOTOROLA Chapter 2. Programming Model 2-33
Instruction Set Summary
A defined instruction can have invalid forms. The MPC750 provides limited support for instructions represented in an invalid form.
2.3.1.3
*
Illegal Instruction Class
Illegal instructions can be grouped into the following categories: Instructions not defined in the PowerPC architecture.The following primary opcodes are defined as illegal but may be used in future extensions to the architecture: 1, 4, 5, 6, 9, 22, 56, 57, 60, 61 Future versions of the PowerPC architecture may define any of these instructions to perform new functions. Instructions defined in the PowerPC architecture but not implemented in a specific implementation. For example, instructions that can be executed on 64-bit processors are considered illegal by 32-bit processors such as the MPC750. The following primary opcodes are defined for 64-bit implementations only and are illegal on the MPC750: 2, 30, 58, 62 All unused extended opcodes are illegal. The unused extended opcodes can be determined from information in Section A.2, "Instructions Sorted by Opcode," and Section 2.3.1.4, "Reserved Instruction Class." Notice that extended opcodes for instructions defined only for 64-bit implementations are illegal in 32-bit implementations, and vice versa. The following primary opcodes have unused extended opcodes. 17, 19, 31, 59, 63 (Primary opcodes 30 and 62 are illegal for all 32-bit implementations, but as 64-bit opcodes they have some unused extended opcodes.) An instruction consisting of only zeros is guaranteed to be an illegal instruction. This increases the probability that an attempt to execute data or uninitialized memory invokes the system illegal instruction error handler (a program exception). Note that if only the primary opcode consists of all zeros, the instruction is considered a reserved instruction, as described in Section 2.3.1.4, "Reserved Instruction Class."
*
*
*
The MPC750 invokes the system illegal instruction error handler (a program exception) when it detects any instruction from this class or any instructions defined only for 64-bit implementations. See Section 4.5.7, "Program Exception (0x00700)," for additional information about illegal and invalid instruction exceptions. Except for an instruction consisting of binary zeros, illegal instructions are available for additions to the PowerPC architecture.
2-34
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
2.3.1.4
Reserved Instruction Class
Reserved instructions are allocated to specific implementation-dependent purposes not defined by the PowerPC architecture. Attempting to execute an unimplemented reserved instruction invokes the illegal instruction error handler (a program exception). See "Program Exception (0x00700)," in Chapter 6, "Exceptions," in the Programming Environments Manual for information about illegal and invalid instruction exceptions. The PowerPC architecture defines four types of reserved instructions: * Instructions in the POWER architecture not part of the PowerPC UISA. For details on POWER architecture incompatibilities and how they are handled by processors in this family, see Appendix B, "POWER Architecture Cross Reference," in the Programming Environments Manual. Implementation-specific instructions required for the processor to conform to the PowerPC architecture (none of these are implemented in the MPC750) All other implementation-specific instructions Architecturally-allowed extended opcodes
* * *
2.3.2
Addressing Modes
This section provides an overview of conventions for addressing memory and for calculating effective addresses as defined by the PowerPC architecture for 32-bit implementations. For more detailed information, see "Conventions," in Chapter 4, "Addressing Modes and Instruction Set Summary," of the Programming Environments Manual.
2.3.2.1
Memory Addressing
A program references memory using the effective (logical) address computed by the processor when it executes a memory access or branch instruction or when it fetches the next sequential instruction. Bytes in memory are numbered consecutively starting with zero. Each number is the address of the corresponding byte.
2.3.2.2
Memory Operands
Memory operands may be bytes, half words, words, or double words, or, for the load/store multiple and load/store string instructions, a sequence of bytes or words. The address of a memory operand is the address of its first byte (that is, of its lowest-numbered byte). Operand length is implicit for each instruction. The PowerPC architecture supports both big-endian and little-endian byte ordering. The default byte and bit ordering is big-endian. See "Byte Ordering," in Chapter 3, "Operand Conventions," of the Programming Environments Manual for more information about big- and little-endian byte ordering.
MOTOROLA Chapter 2. Programming Model 2-35
Instruction Set Summary
The operand of a single-register memory access instruction has a natural alignment boundary equal to the operand length. In other words, the "natural" address of an operand is an integral multiple of the operand length. A memory operand is said to be aligned if it is aligned at its natural boundary; otherwise it is misaligned. For a detailed discussion about memory operands, see Chapter 3, "Operand Conventions," of the Programming Environments Manual.
2.3.2.3
Effective Address Calculation
An effective address is the 32-bit sum computed by the processor when executing a memory access or branch instruction or when fetching the next sequential instruction. For a memory access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address, the memory operand is considered to wrap around from the maximum effective address through effective address 0, as described in the following paragraphs. Effective address computations for both data and instruction accesses use 32-bit unsigned binary arithmetic. A carry from bit 0 is ignored. Load and store operations have the following modes of effective address generation: * * EA = (rA|0) + offset (including offset = 0) (register indirect with immediate index) EA = (rA|0) + rB (register indirect with index)
Refer to Section 2.3.4.3.2, "Integer Load and Store Address Generation," for a detailed description of effective address generation for load and store operations. Branch instructions have three categories of effective address generation: * * * Immediate Link register indirect Count register indirect
2.3.2.4
Synchronization
The synchronization described in this section refers to the state of the processor that is performing the synchronization. 2.3.2.4.1 Context Synchronization
The System Call (sc) and Return from Interrupt (rfi) instructions perform context synchronization by allowing previously issued instructions to complete before performing a change in context. Execution of one of these instructions ensures the following: * * No higher priority exception exists (sc). All previous instructions have completed to a point where they can no longer cause an exception. If a prior memory access instruction causes direct-store error
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
2-36
Instruction Set Summary
* *
exceptions, the results are guaranteed to be determined before this instruction is executed. Previous instructions complete execution in the context (privilege, protection, and address translation) under which they were issued. The instructions following the sc or rfi instruction execute in the context established by these instructions. Execution Synchronization
2.3.2.4.2
An instruction is execution synchronizing if all previously initiated instructions appear to have completed before the instruction is initiated or, in the case of sync and isync, before the instruction completes. For example, the Move to Machine State Register (mtmsr) instruction is execution synchronizing. It ensures that all preceding instructions have completed execution and cannot cause an exception before the instruction executes, but does not ensure subsequent instructions execute in the newly established environment. For example, if the mtmsr sets the MSR[PR] bit, unless an isync immediately follows the mtmsr instruction, a privileged instruction could be executed or privileged access could be performed without causing an exception even though the MSR[PR] bit indicates user mode. 2.3.2.4.3 Instruction-Related Exceptions
There are two kinds of exceptions in the MPC750--those caused directly by the execution of an instruction and those caused by an asynchronous event (or interrupts). Either may cause components of the system software to be invoked. Exceptions can be caused directly by the execution of an instruction as follows: * An attempt to execute an illegal instruction causes the illegal instruction (program exception) handler to be invoked. An attempt by a user-level program to execute the supervisor-level instructions listed below causes the privileged instruction (program exception) handler to be invoked. The MPC750 provides the following supervisor-level instructions: dcbi, mfmsr, mfspr, mfsr, mfsrin, mtmsr, mtspr, mtsr, mtsrin, rfi, tlbie, and tlbsync. Note that the privilege level of the mfspr and mtspr instructions depends on the SPR encoding. Any mtspr, mfspr, or mftb instruction with an invalid SPR (or TBR) field causes an illegal type program exception. Likewise, a program exception is taken if user-level software tries to access a supervisor-level SPR. An mtspr instruction executing in supervisor mode (MSR[PR] = 0) with the SPR field specifying HID1 or PVR (read-only registers) executes as a no-op. An attempt to access memory that is not available (page fault) causes the ISI or DSI exception handler to be invoked. The execution of an sc instruction invokes the system call exception handler that permits a program to request the system to perform a service. The execution of a trap instruction invokes the program exception trap handler.
Chapter 2. Programming Model 2-37
*
* * *
MOTOROLA
Instruction Set Summary
*
The execution of an instruction that causes a floating-point exception while exceptions are enabled in the MSR invokes the program exception handler.
A detailed description of exception conditions is provided in Chapter 4, "Exceptions."
2.3.3
Instruction Set Overview
This section provides a brief overview of the PowerPC instructions implemented in the MPC750 and highlights any special information with respect to how the MPC750 implements a particular instruction. Note that the categories used in this section correspond to those used in Chapter 4, "Addressing Modes and Instruction Set Summary," in the Programming Environments Manual. These categorizations are somewhat arbitrary and are provided for the convenience of the programmer and do not necessarily reflect the PowerPC architecture specification. Note that some instructions have the following optional features: * * CR Update--The dot (.) suffix on the mnemonic enables the update of the CR. Overflow option--The o suffix indicates that the overflow bit in the XER is enabled.
2.3.4
PowerPC UISA Instructions
The PowerPC UISA includes the base user-level instruction set (excluding a few user-level cache control, synchronization, and time base instructions), user-level registers, programming model, data types, and addressing modes. This section discusses the instructions defined in the UISA.
2.3.4.1
* * * *
Integer Instructions
This section describes the integer instructions. These consist of the following: Integer arithmetic instructions Integer compare instructions Integer logical instructions Integer rotate and shift instructions
Integer instructions use the content of the GPRs as source operands and place results into GPRs, into the integer exception register (XER), and into condition register (CR) fields. 2.3.4.1.1 Integer Arithmetic Instructions
Table 2-22 lists the integer arithmetic instructions for the processors in this family.
2-38
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-22. Integer Arithmetic Instructions
Name Add Immediate Add Immediate Shifted Add Subtract From Add Immediate Carrying Add Immediate Carrying and Record Subtract from Immediate Carrying Add Carrying Subtract from Carrying Add Extended Subtract from Extended Add to Minus One Extended Subtract from Minus One Extended Add to Zero Extended Subtract from Zero Extended Negate Multiply Low Immediate Multiply Low Multiply High Word Multiply High Word Unsigned Divide Word Divide Word Unsigned Mnemonic addi addis add (add. addo addo.) subf (subf. subfo subfo.) addic addic. subfic addc (addc. addco addco.) subfc (subfc. subfco subfco.) adde (adde. addeo addeo.) subfe (subfe. subfeo subfeo.) addme (addme. addmeo addmeo.) subfme (subfme. subfmeo subfmeo.) addze (addze. addzeo addzeo.) subfze (subfze. subfzeo subfzeo.) neg (neg. nego nego.) mulli mullw (mullw. mullwo mullwo.) mulhw (mulhw.) mulhwu (mulhwu.) divw (divw. divwo divwo.) divwu divwu. divwuo divwuo. Syntax rD,rA,SIMM rD,rA,SIMM rD,rA,rB rD,rA,rB rD,rA,SIMM rD,rA,SIMM rD,rA,SIMM rD,rA,rB rD,rA,rB rD,rA,rB rD,rA,rB rD,rA rD,rA rD,rA rD,rA rD,rA rD,rA,SIMM rD,rA,rB rD,rA,rB rD,rA,rB rD,rA,rB rD,rA,rB
Although there is no Subtract Immediate instruction, its effect can be achieved by using an addi instruction with the immediate operand negated. Simplified mnemonics are provided that include this negation. The subf instructions subtract the second operand (rA) from the third operand (rB). Simplified mnemonics are provided in which the third operand is subtracted from the second operand. See Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for examples. The UISA states that an implementation that executes instructions that set the overflow enable bit (OE) or the carry bit (CA) may either execute these instructions slowly or prevent execution of the subsequent instruction until the operation completes. Chapter 6, "Instruction Timing," describes how the MPC750 handles CR dependencies. The summary overflow bit (SO) and overflow bit (OV) in the integer exception register are set to reflect an overflow condition of a 32-bit result. This can happen only when OE = 1.
MOTOROLA
Chapter 2. Programming Model
2-39
Instruction Set Summary
2.3.4.1.2
Integer Compare Instructions
The integer compare instructions algebraically or logically compare the contents of register rA with either the zero-extended value of the UIMM operand, the sign-extended value of the SIMM operand, or the contents of register rB. The comparison is signed for the cmpi and cmp instructions, and unsigned for the cmpli and cmpl instructions. Table 2-23 summarizes the integer compare instructions.
Table 2-23. Integer Compare Instructions
Name Compare Immediate Compare Compare Logical Immediate Compare Logical Mnemonic cmpi cmp cmpli cmpl Syntax crfD,L,rA,SIMM crfD,L,rA,rB crfD,L,rA,UIMM crfD,L,rA,rB
The crfD operand can be omitted if the result of the comparison is to be placed in CR0. Otherwise the target CR field must be specified in crfD, using an explicit field number. For information on simplified mnemonics for the integer compare instructions see Appendix F, "Simplified Mnemonics," in the Programming Environments Manual. 2.3.4.1.3 Integer Logical Instructions
The logical instructions shown in Table 2-24 perform bit-parallel operations on the specified operands. Logical instructions with the CR updating enabled (uses dot suffix) and instructions andi. and andis. set CR field CR0 to characterize the result of the logical operation. Logical instructions do not affect XER[SO], XER[OV], or XER[CA]. See Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for simplified mnemonic examples for integer logical operations.
Table 2-24. Integer Logical Instructions
Name AND Immediate AND Immediate Shifted OR Immediate Mnemonic andi. andis. ori Syntax rA,rS,UIMM -- rA,rS,UIMM -- rA,rS,UIMM The PowerPC architecture defines ori r0,r0,0 as the preferred form for the no-op instruction. The dispatcher discards this instruction (except for pending trace or breakpoint exceptions). rA,rS,UIMM -- rA,rS,UIMM -- rA,rS,UIMM -- rA,rS,rB -- Implementation Notes
OR Immediate Shifted XOR Immediate XOR Immediate Shifted AND
oris xori xoris and (and.)
2-40
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-24. Integer Logical Instructions (continued)
Name OR XOR NAND NOR Equivalent AND with Complement OR with Complement Extend Sign Byte Extend Sign Half Word Mnemonic or (or.) xor (xor.) Syntax rA,rS,rB rA,rS,rB rA,rS,rB rA,rS,rB rA,rS,rB rA,rS,rB rA,rS,rB rA,rS rA,rS -- -- -- -- -- -- -- -- -- -- Implementation Notes
nand (nand.) nor eqv andc orc extsb extsh (nor.) (eqv.) (andc.) (orc.) (extsb.) (extsh.)
Count Leading Zeros Word cntlzw
(cntlzw.) rA,rS
2.3.4.1.4
Integer Rotate and Shift Instructions
Rotation operations are performed on data from a GPR, and the result, or a portion of the result, is returned to a GPR. See Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for a complete list of simplified mnemonics that allows simpler coding of often-used functions such as clearing the leftmost or rightmost bits of a register, left justifying or right justifying an arbitrary field, and simple rotates and shifts. Integer rotate instructions rotate the contents of a register. The result of the rotation is either inserted into the target register under control of a mask (if a mask bit is 1 the associated bit of the rotated data is placed into the target register, and if the mask bit is 0 the associated bit in the target register is unchanged), or ANDed with a mask before being placed into the target register. The integer rotate instructions are summarized in Table 2-25.
Table 2-25. Integer Rotate Instructions
Name Rotate Left Word Immediate then AND with Mask Rotate Left Word then AND with Mask Rotate Left Word Immediate then Mask Insert Mnemonic rlwinm (rlwinm.) rlwnm (rlwnm.) rlwimi (rlwimi.) Syntax rA,rS,SH,MB,ME rA,rS,rB,MB,ME rA,rS,SH,MB,ME
The integer shift instructions perform left and right shifts. Immediate-form logical (unsigned) shift operations are obtained by specifying masks and shift values for certain rotate instructions. Simplified mnemonics (shown in Appendix F, "Simplified Mnemonics," in the Programming Environments Manual) are provided to make coding of such shifts simpler and easier to understand.
MOTOROLA
Chapter 2. Programming Model
2-41
Instruction Set Summary
Multiple-precision shifts can be programmed as shown in Appendix C, "Multiple-Precision Shifts," in the Programming Environments Manual. The integer shift instructions are summarized in Table 2-26.
Table 2-26. Integer Shift Instructions
Name Shift Left Word Shift Right Word Shift Right Algebraic Word Immediate Shift Right Algebraic Word Mnemonic slw (slw.) srw (srw.) srawi (srawi.) sraw (sraw.) Syntax rA,rS,rB rA,rS,rB rA,rS,SH rA,rS,rB
2.3.4.2
* * * * * *
Floating-Point Instructions
This section describes the floating-point instructions, which include the following: Floating-point arithmetic instructions Floating-point multiply-add instructions Floating-point rounding and conversion instructions Floating-point compare instructions Floating-point status and control register instructions Floating-point move instructions
See Section 2.3.4.3, "Load and Store Instructions," for information about floating-point loads and stores. The PowerPC architecture supports a floating-point system as defined in the IEEE 754 standard, but requires software support to conform with that standard. All floating-point operations conform to the IEEE 754 standard, except if software sets the non-IEEE mode FPSCR[NI]. 2.3.4.2.1 Floating-Point Arithmetic Instructions
Table 2-27. Floating-Point Arithmetic Instructions
Name Floating Add (Double-Precision) Floating Add Single Floating Subtract (Double-Precision) Floating Subtract Single Floating Multiply (Double-Precision) Floating Multiply Single Mnemonic fadd (fadd.) fadds (fadds.) fsub (fsub.) fsubs (fsubs.) fmul (fmul.) fmuls (fmuls.) Syntax frD,frA,frB frD,frA,frB frD,frA,frB frD,frA,frB frD,frA,frC frD,frA,frC
The floating-point arithmetic instructions are summarized in Table 2-27.
2-42
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-27. Floating-Point Arithmetic Instructions (continued)
Name Floating Divide (Double-Precision) Floating Divide Single Floating Reciprocal Estimate Single
1
Mnemonic fdiv (fdiv.) fdivs (fdivs.) fres (fres.) frsqrte (frsqrte.) fsel
Syntax frD,frA,frB frD,frA,frB frD,frB frD,frB frD,frA,frC,frB
Floating Reciprocal Square Root Estimate 1 Floating Select
1The 1
fsel instruction is optional in the PowerPC architecture.
All single-precision arithmetic instructions are performed using a double-precision format. The floating-point architecture is a single-pass implementation for double-precision products. In most cases, a single-precision instruction using only single-precision operands, in double-precision format, has the same latency as its double-precision equivalent. 2.3.4.2.2 Floating-Point Multiply-Add Instructions
These instructions combine multiply and add operations without an intermediate rounding operation. The floating-point multiply-add instructions are summarized in Table 2-28.
Table 2-28. Floating-Point Multiply-Add Instructions
Name Floating Multiply-Add (Double-Precision) Floating Multiply-Add Single Floating Multiply-Subtract (Double-Precision) Floating Multiply-Subtract Single Floating Negative Multiply-Add (Double-Precision) Floating Negative Multiply-Add Single Floating Negative Multiply-Subtract (Double-Precision) Floating Negative Multiply-Subtract Single Mnemonic fmadd (fmadd.) fmadds (fmadds.) fmsub (fmsub.) fmsubs (fmsubs.) fnmadd (fnmadd.) fnmadds (fnmadds.) fnmsub (fnmsub.) fnmsubs (fnmsubs.) Syntax frD,frA,frC,frB frD,frA,frC,frB frD,frA,frC,frB frD,frA,frC,frB frD,frA,frC,frB frD,frA,frC,frB frD,frA,frC,frB frD,frA,frC,frB
2.3.4.2.3
Floating-Point Rounding and Conversion Instructions
The Floating Round to Single-Precision (frsp) instruction is used to truncate a 64-bit double-precision number to a 32-bit single-precision floating-point number. The floating-point convert instructions convert a 64-bit double-precision floating-point number to a 32-bit signed integer number. Examples of uses of these instructions to perform various conversions can be found in Appendix D, "Floating-Point Models," in the Programming Environments Manual.
MOTOROLA
Chapter 2. Programming Model
2-43
Instruction Set Summary
Table 2-29. Floating-Point Rounding and Conversion Instructions
Name Floating Round to Single Floating Convert to Integer Word Floating Convert to Integer Word with Round toward Zero Mnemonic frsp (frsp.) fctiw (fctiw.) fctiwz (fctiwz.) Syntax frD,frB frD,frB frD,frB
2.3.4.2.4
Floating-Point Compare Instructions
Floating-point compare instructions compare the contents of two floating-point registers. The comparison ignores the sign of zero (that is +0 = -0). The floating-point compare instructions are summarized in Table 2-30.
Table 2-30. Floating-Point Compare Instructions
Name Floating Compare Unordered Floating Compare Ordered Mnemonic fcmpu fcmpo Syntax crfD,frA,frB crfD,frA,frB
The PowerPC architecture allows an fcmpu or fcmpo instruction with the Rc bit set to produce a boundedly-undefined result, which may include an illegal instruction program exception. In the MPC750, crfD should be treated as undefined 2.3.4.2.5 Floating-Point Status and Control Register Instructions
Every FPSCR instruction appears to synchronize the effects of all floating-point instructions executed by a given processor. Executing an FPSCR instruction ensures that all floating-point instructions previously initiated by the given processor appear to have completed before the FPSCR instruction is initiated and that no subsequent floating-point instructions appear to be initiated by the given processor until the FPSCR instruction has completed. The FPSCR instructions are summarized in Table 2-31.
Table 2-31. Floating-Point Status and Control Register Instructions
Name Move from FPSCR Move to Condition Register from FPSCR Move to FPSCR Field Immediate Move to FPSCR Fields Move to FPSCR Bit 0 Move to FPSCR Bit 1 Mnemonic mffs (mffs.) mcrfs mtfsfi (mtfsfi.) mtfsf (mtfsf.) mtfsb0 (mtfsb0.) mtfsb1 (mtfsb1.) frD crfD,crfS crfD,IMM FM,frB crbD crbD Syntax
Implementation Note--The PowerPC architecture states that in some implementations, the Move to FPSCR Fields (mtfsf) instruction may perform more slowly when only some
2-44
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
of the fields are updated as opposed to all of the fields. In the MPC750, there is no degradation of performance. 2.3.4.2.6 Floating-Point Move Instructions
Floating-point move instructions copy data from one FPR to another. The floating-point move instructions do not modify the FPSCR. The CR update option in these instructions controls the placing of result status into CR1. Table 2-32 summarizes the floating-point move instructions.
Table 2-32. Floating-Point Move Instructions
Name Floating Move Register Floating Negate Floating Absolute Value Floating Negative Absolute Value Mnemonic fmr (fmr.) fneg (fneg.) fabs (fabs.) fnabs (fnabs.) frD,frB frD,frB frD,frB frD,frB Syntax
2.3.4.3
Load and Store Instructions
Load and store instructions are issued and translated in program order; however, the accesses can occur out of order. Synchronizing instructions are provided to enforce strict ordering. This section describes the load and store instructions, which consist of the following: * * * * * * * Integer load instructions Integer store instructions Integer load and store with byte-reverse instructions Integer load and store multiple instructions Floating-point load instructions Floating-point store instructions Memory synchronization instructions Notes--The following describes how the MPC750 handles
Implementation misalignment:
The MPC750 provides hardware support for misaligned memory accesses. It performs those accesses within a single cycle if the operand lies within a double-word boundary. Misaligned memory accesses that cross a double-word boundary degrade performance. For string operations, the hardware makes no attempt to combine register values to reduce the number of discrete accesses. Combining stores enhances performance if store gathering is enabled and the accesses meet the criteria described in Section 6.4.7, "Integer Store Gathering." Note that the PowerPC architecture requires load/store multiple instruction accesses to be aligned. At a minimum, additional cache access cycles are required.
MOTOROLA Chapter 2. Programming Model 2-45
Instruction Set Summary
Although many unaligned memory accesses are supported in hardware, the frequent use of them is discouraged since they can compromise the overall performance of the processor. Accesses that cross a translation boundary may be restarted. That is, a misaligned access that crosses a page boundary is completely restarted if the second portion of the access causes a page fault. This may cause the first access to be repeated. On some processors, such as the MPC603, a TLB reload would cause an instruction restart. On the MPC750, TLB reloads are done transparently and only a page fault causes a restart. 2.3.4.3.1 Self-Modifying Code
When a processor modifies a memory location that may be contained in the instruction cache, software must ensure that memory updates are visible to the instruction fetching mechanism. This can be achieved by the following instruction sequence:
dcbst sync icbi isync |update memory |wait for update |remove (invalidate) copy in instruction cache |remove copy in own instruction buffer
These operations are required because the data cache is a write-back cache. Since instruction fetching bypasses the data cache, changes to items in the data cache may not be reflected in memory until the fetch operations complete. Special care must be taken to avoid coherency paradoxes in systems that implement unified secondary caches, and designers should carefully follow the guidelines for maintaining cache coherency that are provided in the VEA, and discussed in Chapter 5, "Cache Model and Memory Coherency," in the Programming Environments Manual. Because the MPC750 does not broadcast the M bit for instruction fetches, external caches are subject to coherency paradoxes. 2.3.4.3.2 Integer Load and Store Address Generation
Integer load and store operations generate effective addresses using register indirect with immediate index mode, register indirect with index mode, or register indirect mode. See Section 2.3.2.3, "Effective Address Calculation," for information about calculating effective addresses. Note that in some implementations, operations that are not naturally aligned may suffer performance degradation. Refer to Section 4.5.6, "Alignment Exception (0x00600)," for additional information about load and store address alignment exceptions. 2.3.4.3.3 Register Indirect Integer Load Instructions
For integer load instructions, the byte, half word, word, or double word addressed by the EA (effective address) is loaded into rD. Many integer load instructions have an update form, in which rA is updated with the generated effective address. For these forms, if rA 0 and rA rD (otherwise invalid), the EA is placed into rA and the memory element (byte, half word, word, or double word) addressed by the EA is loaded into rD. Note that
2-46 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Set Summary
the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms. Implementation Notes--The following notes describe the MPC750 implementation of integer load instructions: * The PowerPC architecture cautions programmers that some implementations of the architecture may execute the load half algebraic (lha, lhax) instructions with greater latency than other types of load instructions. This is not the case for the MPC750; these instructions operate with the same latency as other load instructions. The PowerPC architecture cautions programmers that some implementations of the architecture may run the load/store byte-reverse (lhbrx, lbrx, sthbrx, stwbrx) instructions with greater latency than other types of load/store instructions. This is not the case for the MPC750. These instructions operate with the same latency as the other load/store instructions. The PowerPC architecture describes some preferred instruction forms for load and store multiple instructions and integer move assist instructions that may perform better than other forms in some implementations. None of these preferred forms affect instruction performance on the MPC750. The PowerPC architecture defines the lwarx and stwcx. as a way to update memory atomically. In the MPC750, reservations are made on behalf of aligned 32-byte sections of the memory address space. Executing lwarx and stwcx. to a page marked write-through does not cause a DSI exception if the W bit is set, but as with other memory accesses, DSI exceptions can result for other reasons such as a protection violations or page faults. In general, because stwcx. always causes an external bus transaction it has slightly worse performance characteristics than normal store operations.
*
*
*
*
Table 2-33 summarizes the integer load instructions.
Table 2-33. Integer Load Instructions
Name Load Byte and Zero Load Byte and Zero Indexed Load Byte and Zero with Update Load Byte and Zero with Update Indexed Load Half Word and Zero Load Half Word and Zero Indexed Load Half Word and Zero with Update Load Half Word and Zero with Update Indexed Load Half Word Algebraic Load Half Word Algebraic Indexed Mnemonic lbz lbzx lbzu lbzux lhz lhzx lhzu lhzux lha lhax rD,d(rA) rD,rA,rB rD,d(rA) rD,rA,rB rD,d(rA) rD,rA,rB rD,d(rA) rD,rA,rB rD,d(rA) rD,rA,rB Syntax
MOTOROLA
Chapter 2. Programming Model
2-47
Instruction Set Summary
Table 2-33. Integer Load Instructions (continued)
Name Load Half Word Algebraic with Update Load Half Word Algebraic with Update Indexed Load Word and Zero Load Word and Zero Indexed Load Word and Zero with Update Load Word and Zero with Update Indexed Mnemonic lhau lhaux lwz lwzx lwzu lwzux rD,d(rA) rD,rA,rB rD,d(rA) rD,rA,rB rD,d(rA) rD,rA,rB Syntax
2.3.4.3.4
Integer Store Instructions
For integer store instructions, the contents of rS are stored into the byte, half word, word or double word in memory addressed by the EA (effective address). Many store instructions have an update form, in which rA is updated with the EA. For these forms, the following rules apply: * * If rA 0, the effective address is placed into rA. If rS = rA, the contents of register rS are copied to the target memory element, then the generated EA is placed into rA (rS).
The PowerPC architecture defines store with update instructions with rA = 0 as an invalid form. In addition, it defines integer store instructions with the CR update option enabled (Rc field, bit 31, in the instruction encoding = 1) to be an invalid form. Table 2-34 summarizes the integer store instructions.
Table 2-34. Integer Store Instructions
Name Store Byte Store Byte Indexed Store Byte with Update Store Byte with Update Indexed Store Half Word Store Half Word Indexed Store Half Word with Update Store Half Word with Update Indexed Store Word Store Word Indexed Store Word with Update Store Word with Update Indexed Mnemonic stb stbx stbu stbux sth sthx sthu sthux stw stwx stwu stwux Syntax rS,d(rA) rS,rA,rB rS,d(rA) rS,rA,rB rS,d(rA) rS,rA,rB rS,d(rA) rS,rA,rB rS,d(rA) rS,rA,rB rS,d(rA) rS,rA,rB
2-48
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
2.3.4.3.5
Integer Store Gathering
The MPC750 performs store gathering for write-through accesses to nonguarded space or to cache-inhibited stores to nonguarded space if the stores are 4 bytes and they are word-aligned. These stores are combined in the load/store unit (LSU) to form a double word and are sent out on the 60x bus as a single-beat operation. However, stores can be gathered only if the successive stores that meet the criteria are queued and pending. Store gathering takes place regardless of the address order of the stores. The store gathering feature is enabled by setting HID0[SGE]. Store gathering is done for both big- and little-endian modes. Store gathering is not done for the following: * * * * * * Cacheable stores Stores to guarded cache-inhibited or write-through space Byte-reverse store stwcx. and ecowx accesses Floating-point stores Store operations attempted during a hardware table search
If store gathering is enabled and the stores do not fall under the above categories, an eieio or sync instruction must be used to prevent two stores from being gathered. 2.3.4.3.6 Integer Load and Store with Byte-Reverse Instructions
Table 2-35 describes integer load and store with byte-reverse instructions. When used in a system operating with the default big-endian byte order, these instructions have the effect of loading and storing data in little-endian order. Likewise, when used in a system operating with little-endian byte order, these instructions have the effect of loading and storing data in big-endian order. For more information about big-endian and little-endian byte ordering, see "Byte Ordering," in Chapter 3, "Operand Conventions," in the Programming Environments Manual.
Table 2-35. Integer Load and Store with Byte-Reverse Instructions
Name Load Half Word Byte-Reverse Indexed Load Word Byte-Reverse Indexed Store Half Word Byte-Reverse Indexed Store Word Byte-Reverse Indexed Mnemonic lhbrx lwbrx sthbrx stwbrx Syntax rD,rA,rB rD,rA,rB rS,rA,rB rS,rA,rB
2.3.4.3.7
Integer Load and Store Multiple Instructions
The load/store multiple instructions are used to move blocks of data to and from the GPRs. The load multiple and store multiple instructions may have operands that require memory
MOTOROLA Chapter 2. Programming Model 2-49
Instruction Set Summary
accesses crossing a 4-Kbyte page boundary. As a result, these instructions may be interrupted by a DSI exception associated with the address translation of the second page. Implementation Notes--The following describes the MPC750 implementation of the load/store multiple instruction: * For load/store string operations, the hardware does not combine register values to reduce the number of discrete accesses. However, if store gathering is enabled and the accesses fall under the criteria for store gathering the stores may be combined to enhance performance. At a minimum, additional cache access cycles are required. The MPC750 supports misaligned, single-register load and store accesses in little-endian mode without causing an alignment exception. However, execution of misaligned load/store multiple/string operations causes an alignment exception.
*
The PowerPC architecture defines the load multiple word (lmw) instruction with rA in the range of registers to be loaded as an invalid form.
Table 2-36. Integer Load and Store Multiple Instructions
Name Load Multiple Word Store Multiple Word Mnemonic lmw stmw Syntax rD,d(rA) rS,d(rA)
2.3.4.3.8
Integer Load and Store String Instructions
The integer load and store string instructions allow movement of data from memory to registers or from registers to memory without concern for alignment. These instructions can be used for a short move between arbitrary memory locations or to initiate a long move between misaligned memory fields. However, in some implementations, these instructions are likely to have greater latency and take longer to execute, perhaps much longer, than a sequence of individual load or store instructions that produce the same results. Table 2-37 summarizes the integer load and store string instructions. In other implementations operating with little-endian byte order, execution of a load or string instruction invokes the alignment error handler; see "Byte Ordering," in the Programming Environments Manual for more information.
Table 2-37. Integer Load and Store String Instructions
Name Load String Word Immediate Load String Word Indexed Store String Word Immediate Store String Word Indexed Mnemonic lswi lswx stswi stswx Syntax rD,rA,NB rD,rA,rB rS,rA,NB rS,rA,rB
Load string and store string instructions may involve operands that are not word-aligned.
2-50
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
As described in Section 4.5.6, "Alignment Exception (0x00600)," a misaligned string operation suffers a performance penalty compared to an aligned operation of the same type. A non-word-aligned string operation that crosses a 4-Kbyte boundary, or a word-aligned string operation that crosses a 256-Mbyte boundary always causes an alignment exception. A non-word-aligned string operation that crosses a double-word boundary is also slower than a word-aligned string operation. Implementation Note--The following describes the MPC750 implementation of load/store string instructions: * For load/store string operations, the hardware does not combine register values to reduce the number of discrete accesses. However, if store gathering is enabled and the accesses fall under the criteria for store gathering the stores may be combined to enhance performance. At a minimum, additional cache access cycles are required. The MPC750 supports misaligned, single-register load and store accesses in little-endian mode without causing an alignment exception. However, execution of misaligned load/store multiple/string operations cause an alignment exception. Floating-Point Load and Store Address Generation
*
2.3.4.3.9
Floating-point load and store operations generate effective addresses using the register indirect with immediate index addressing mode and register indirect with index addressing mode. Floating-point loads and stores are not supported for direct-store accesses. The use of floating-point loads and stores for direct-store access results in an alignment exception. There are two forms of the floating-point load instruction--single-precision and double-precision operand formats. Because the FPRs support only the floating-point double-precision format, single-precision floating-point load instructions convert single-precision data to double-precision format before loading an operand into an FPR. Implementation Notes--The MPC750 treats exceptions as follows: * The FPU can be run in two different modes--ignore exceptions mode (MSR[FE0] = MSR[FE1] = 0) and precise mode (any other settings for MSR[FE0,FE1]). For the MPC750, ignore exceptions mode allows floating-point instructions to complete earlier and thus may provide better performance than precise mode. The floating-point load and store indexed instructions (lfsx, lfsux, lfdx, lfdux, stfsx, stfsux, stfdx, stfdux) are invalid when the Rc bit is one. In the MPC750, executing one of these invalid instruction forms causes CR0 to be set to an undefined value.
*
The PowerPC architecture defines a load with update instruction with rA = 0 as an invalid form. Table 2-38 summarizes the floating-point load instructions.
MOTOROLA
Chapter 2. Programming Model
2-51
Instruction Set Summary
Table 2-38. Floating-Point Load Instructions
Name Load Floating-Point Single Load Floating-Point Single Indexed Load Floating-Point Single with Update Load Floating-Point Single with Update Indexed Load Floating-Point Double Load Floating-Point Double Indexed Load Floating-Point Double with Update Load Floating-Point Double with Update Indexed Mnemonic lfs lfsx lfsu lfsux lfd lfdx lfdu lfdux Syntax frD,d(rA) frD,rA,rB frD,d(rA) frD,rA,rB frD,d(rA) frD,rA,rB frD,d(rA) frD,rA,rB
2.3.4.3.10 Floating-Point Store Instructions This section describes floating-point store instructions. There are three basic forms of the store instruction--single-precision, double-precision, and integer. The integer form is supported by the optional stfiwx instruction. Because the FPRs support only floating-point, double-precision format for floating-point data, single-precision floating-point store instructions convert double-precision data to single-precision format before storing the operands. Table 2-39 summarizes the floating-point store instructions.
Table 2-39. Floating-Point Store Instructions
Name Store Floating-Point Single Store Floating-Point Single Indexed Store Floating-Point Single with Update Store Floating-Point Single with Update Indexed Store Floating-Point Double Store Floating-Point Double Indexed Store Floating-Point Double with Update Store Floating-Point Double with Update Indexed Store Floating-Point as Integer Word Indexed 2 Notes:
1
Mnemonic stfs stfsx stfsu stfsux stfd 1 stfdx stfdu stfdux stfiwx frS,d(rA) frS,r B frS,d(rA) frS,r B frS,d(rA) frS,rB frS,d(rA) frS,r B frS,rB
Syntax
The MPC750 and MPC755 require that the FPRs be initialized with floating-point values before the stfd instruction is used. Otherwise, a random power-on value for an FPR may cause unpredictable device behavior when the stfd instruction is executed. Note that any floating-point value loaded into the FPRs is acceptable. 2 The stfiwx instruction is optional to the PowerPC architecture.
Some floating-point store instructions require conversions in the LSU. Table 2-40 shows conversions the LSU makes when executing a Store Floating-Point Single instruction.
2-52 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Set Summary
Table 2-40. Store Floating-Point Single Behavior
FPR Precision Single Single Single Single Double Data Type Normalized Denormalized Zero, infinity, QNaN SNaN Normalized Store Store Store Store If(exp 896) then Denormalize and Store else Store Store zero Store Store Action
Double Double Double
Denormalized Zero, infinity, QNaN SNaN
Table 2-41 shows the conversions made when performing a Store Floating-Point Double instruction. Most entries in the table indicate that the floating-point value is simply stored. Only in a few cases are any other actions taken.
Table 2-41. Store Floating-Point Double Behavior
FPR Precision Single Single Single Single Double Double Double Double Data Type Normalized Denormalized Zero, infinity, QNaN SNaN Normalized Denormalized Zero, infinity, QNaN SNaN Store Normalize and Store Store Store Store Store Store Store Action
Architecturally, all floating-point numbers are represented in double-precision format within the MPC750. Execution of a store floating-point single (stfs, stfsu, stfsx, stfsux) instruction requires conversion from double- to single-precision format. If the exponent is not greater than 896, this conversion requires denormalization. The MPC750 supports this denormalization by shifting the mantissa one bit at a time. Anywhere from 1 to 23 clock cycles are required to complete the denormalization, depending upon the value to be stored. Because of how floating-point numbers are implemented in the MPC750, there is also a case when execution of a store floating-point double (stfd, stfdu, stfdx, stfdux) instruction can require internal shifting of the mantissa. This case occurs when the operand of a store floating-point double instruction is a denormalized single-precision value. The value could be the result of a load floating-point single instruction, a single-precision arithmetic
MOTOROLA Chapter 2. Programming Model 2-53
Instruction Set Summary
instruction, or a floating round to single-precision instruction. In these cases, shifting the mantissa takes from 1 to 23 clock cycles, depending upon the value to be stored. These cycles are incurred during the store.
2.3.4.4
Branch and Flow Control Instructions
Some branch instructions can redirect instruction execution conditionally based on the value of bits in the CR. When the processor encounters one of these instructions, it scans the execution pipelines to determine whether an instruction in progress may affect the particular CR bit. If no interlock is found, the branch can be resolved immediately by checking the bit in the CR and taking the action defined for the branch instruction. 2.3.4.4.1 Branch Instruction Address Calculation
Branch instructions can alter the sequence of instruction execution. Instruction addresses are always assumed to be word aligned; the processors ignore the two low-order bits of the generated branch target address. Branch instructions compute the EA of the next instruction address using the following addressing modes: * * * * * * Branch relative Branch conditional to relative address Branch to absolute address Branch conditional to absolute address Branch conditional to link register Branch conditional to count register
Note that in the MPC750, all branch instructions (b, ba, bl, bla, bc, bca, bcl, bcla, bclr, bclrl, bcctr, bcctrl) and condition register logical instructions (crand, cror, crxor, crnand, crnor, crandc, creqv, crorc, and mcrf) are executed by the BPU. Some of these instructions can redirect instruction execution conditionally based on the value of bits in the CR. Whenever the CR bits resolve, the branch direction is either marked as correct or mispredicted. Correcting a mispredicted branch requires that the MPC750 flush speculatively executed instructions and restore the machine state to immediately after the branch. This correction can be done immediately upon resolution of the condition registers bits. 2.3.4.4.2 Branch Instructions
Table 2-42 lists the branch instructions provided by the processors of this family. To simplify assembly language programming, a set of simplified mnemonics and symbols is provided for the most frequently used forms of branch conditional, compare, trap, rotate and shift, and certain other instructions. See Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for a list of simplified mnemonic examples.
2-54 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Set Summary
Table 2-42. Branch Instructions
Name Branch Branch Conditional Branch Conditional to Link Register Branch Conditional to Count Register Mnemonic b (ba bl bla) bc (bca bcl bcla) bclr (bclrl) bcctr (bcctrl) Syntax target_addr BO,BI,target_addr BO,BI BO,BI
2.3.4.4.3
Condition Register Logical Instructions
Condition register logical instructions, shown in Table 2-43, and the Move Condition Register Field (mcrf) instruction are also defined as flow control instructions.
Table 2-43. Condition Register Logical Instructions
Name Condition Register AND Condition Register OR Condition Register XOR Condition Register NAND Condition Register NOR Condition Register Equivalent Condition Register AND with Complement Condition Register OR with Complement Move Condition Register Field Mnemonic crand cror crxor crnand crnor creqv crandc crorc mcrf Syntax crbD,crbA,crbB crbD,crbA,crbB crbD,crbA,crbB crbD,crbA,crbB crbD,crbA,crbB crbD,crbA, crbB crbD,crbA, crbB crbD,crbA, crbB crfD,crfS
Note that if the LR update option is enabled for any of these instructions, the PowerPC architecture defines these forms of the instructions as invalid. 2.3.4.4.4 Trap Instructions
The trap instructions shown in Table 2-44 are provided to test for a specified set of conditions. If any of the conditions tested by a trap instruction are met, the system trap type program exception is taken. For more information, see Section 4.5.7, "Program Exception (0x00700)." If the tested conditions are not met, instruction execution continues normally.
Table 2-44. Trap Instructions
Name Trap Word Immediate Trap Word Mnemonic twi tw Syntax TO,rA,SIMM TO,rA,rB
See Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for a complete set of simplified mnemonics.
MOTOROLA Chapter 2. Programming Model 2-55
Instruction Set Summary
2.3.4.5
System Linkage Instruction--UISA
The System Call (sc) instruction permits a program to call on the system to perform a service; see Table 2-45. See also Section 2.3.6.1, "System Linkage Instructions--OEA," for additional information.
Table 2-45. System Linkage Instruction--UISA
Name System Call Mnemonic sc Syntax --
Executing this instruction causes the system call exception handler to be evoked. For more information, see Section 4.5.10, "System Call Exception (0x00C00)."
2.3.4.6
Processor Control Instructions--UISA
Processor control instructions are used to read from and write to the condition register (CR), machine state register (MSR), and special-purpose registers (SPRs). See Section 2.3.5.1, "Processor Control Instructions--VEA," for the mftb instruction and Section 2.3.6.2, "Processor Control Instructions--OEA," for information about the instructions used for reading from and writing to the MSR and SPRs. 2.3.4.6.1 Move to/from Condition Register Instructions
Table 2-46. Move to/from Condition Register Instructions
Name Move to Condition Register Fields Move to Condition Register from XER Move from Condition Register Mnemonic mtcrf mcrxr mfcr Syntax CRM,rS crfD rD
Table 2-46 summarizes the instructions for reading from or writing to the condition register.
Implementation Note--The PowerPC architecture indicates that in some implementations the Move to Condition Register Fields (mtcrf) instruction may perform more slowly when only a portion of the fields are updated as opposed to all of the fields. The condition register access latency for the MPC750 is the same in both cases. 2.3.4.6.2 Move to/from Special-Purpose Register Instructions (UISA)
Table 2-47 lists the mtspr and mfspr instructions.
Table 2-47. Move to/from Special-Purpose Register Instructions (UISA)
Name Move to Special-Purpose Register Move from Special-Purpose Register Mnemonic mtspr mfspr SPR,rS rD,SPR Syntax
2-56
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-48 lists the SPR numbers for both user- and supervisor-level accesses.
Table 2-48. PowerPC Encodings
SPR Register Name Decimal CTR DABR DAR DBAT0L DBAT0U DBAT1L DBAT1U DBAT2L DBAT2U DBAT3L DBAT3U DEC DSISR EAR IBAT0L IBAT0U IBAT1L IBAT1U IBAT2L IBAT2U IBAT3L IBAT3U LR PVR SDR1 SPRG0 SPRG1 SPRG2 SPRG3 SRR0 SRR1 9 1013 19 537 536 539 538 541 540 543 542 22 18 282 529 528 531 530 533 532 535 534 8 287 25 272 273 274 275 26 27 spr[5-9] 00000 11111 00000 10000 10000 10000 10000 10000 10000 10000 10000 00000 00000 01000 10000 10000 10000 10000 10000 10000 10000 10000 00000 01000 00000 01000 01000 01000 01000 00000 00000 spr[0-4] 01001 10101 10011 11001 11000 11011 11010 11101 11100 11111 11110 10110 10010 11010 10001 10000 10011 10010 10101 10100 10111 10110 01000 11111 11001 10000 10001 10010 10011 11010 11011 User (UISA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) User (UISA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both mfspr Both Both Both Both Both Both Both
1
Access
mfspr/mtspr
MOTOROLA
Chapter 2. Programming Model
2-57
Instruction Set Summary
Table 2-48. PowerPC Encodings (continued)
SPR Register Name Decimal TBL 2 268 284 TBU 2 269 285 XER Notes:
1 1
Access spr[5-9] 01000 01000 01000 01000 00000 spr[0-4] 01100 11100 01101 11101 00001 Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) Supervisor (OEA) User (UISA)
mfspr/mtspr
mtspr mtspr mtspr mtspr Both
1
The order of the two 5-bit halves of the SPR number is reversed compared with actual instruction coding. For mtspr and mfspr instructions, the SPR number coded in assembly language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order five bits appearing in bits 16-20 of the instruction and the low-order five bits in bits 11-15. 2 The TB registers are referred to as TBRs rather than SPRs and can be written to using the mtspr instruction in supervisor mode and the TBR numbers here. The TB registers can be read in user mode using either the mftb or mtspr instruction and specifying TBR 268 for TBL and TBR 269 for TBU.
Encodings for the MPC750-specific SPRs are listed in Table 2-49.
Table 2-49. SPR Encodings for MPC750-Defined Registers (mfspr)
SPR Register Name Decimal DABR HID0 HID1 IABR ICTC L2CR MMCR0 MMCR1 PMC1 PMC2 PMC3 PMC4 SIA THRM1 THRM2 THRM3 1013 1008 1009 1010 1019 1017 952 956 953 954 957 958 955 1020 1021 1022 spr[5-9] 11111 11111 11111 11111 11111 11111 11101 11101 11101 11101 11101 11101 11101 11111 11111 11111 spr[0-4] 10101 10000 10001 10010 11011 11001 11000 11100 11001 11010 11101 11110 11011 11100 11101 11110 User Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both Both
1
Access
mfspr/mtspr
2-58
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-49. SPR Encodings for MPC750-Defined Registers (mfspr) (continued)
SPR Register Name Decimal UMMCR0 UMMCR1 UPMC1 UPMC2 UPMC3 UPMC4 USIA
Note: 1Note 1
Access spr[5-9] 11101 11101 11101 11101 11101 11101 11101 spr[0-4] 01000 01100 01001 01010 01101 01110 01011 User User User User User User User
mfspr/mtspr
936 940 937 938 941 942 939
mfspr mfspr mfspr mfspr mfspr mfspr mfspr
that the order of the two 5-bit halves of the SPR number is reversed compared with actual instruction coding. For mtspr and mfspr instructions, the SPR number coded in assembly language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16-20 of the instruction and the low-order 5 bits in bits 11-15.
2.3.4.7
Memory Synchronization Instructions--UISA
Memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events, and the order in which memory operations are seen by other processors or memory access mechanisms. See Chapter 3, "L1 Instruction and Data Cache Operation," for additional information about these instructions and about related aspects of memory synchronization. See Table 2-50 for a summary.
Table 2-50. Memory Synchronization Instructions--UISA
Name Load Word and Reserve Indexed Store Word Conditional Indexed Mnemonic Syntax lwarx Implementation Notes
stwcx.
rD,rA,rB Programmers can use lwarx with stwcx. to emulate common semaphore operations such as test and set, compare and swap, exchange memory, and fetch and add. Both instructions must use the same EA. Reservation granularity is implementation-dependent. The MPC750 makes reservations on behalf of rS,rA,rB aligned 32-byte sections of the memory address space. If the W bit is set, executing lwarx and stwcx. to a page marked write-through does not cause a DSI exception, but DSI exceptions can result for other reasons. If the location is not word-aligned, an alignment exception occurs. The stwcx. instruction is the only load/store instruction with a valid form if Rc is set. If Rc is zero, executing stwcx. sets CR0 to an undefined value. In general, stwcx. always causes a transaction on the external bus and thus operates with slightly worse performance characteristics than normal store operations.
MOTOROLA
Chapter 2. Programming Model
2-59
Instruction Set Summary
Table 2-50. Memory Synchronization Instructions--UISA (continued)
Name Synchronize Mnemonic Syntax sync -- Implementation Notes Because it delays subsequent instructions until all previous instructions complete to where they cannot cause an exception, sync is a barrier against store gathering. Additionally, all load/store cache/bus activities initiated by prior instructions are completed. Touch load operations (dcbt, dcbtst) must complete address translation, but need not complete on the bus. If HID0[ABE] = 1, sync completes after a successful broadcast. The latency of sync depends on the processor state when it is dispatched and on various system-level situations. Therefore, frequent use of sync may degrade performance.
System designs with an L2 cache should take special care to recognize the hardware signaling caused by a SYNC bus operation and perform the appropriate actions to guarantee that memory references that may be queued internally to the L2 cache have been performed globally. See 2.3.5.2, "Memory Synchronization Instructions--VEA," for details about additional memory synchronization (eieio and isync) instructions. In the PowerPC architecture, the Rc bit must be zero for most load and store instructions. If Rc is set, the instruction form is invalid for sync and lwarx instructions. If the MPC750 encounters one of these invalid instruction forms, it sets CR0 to an undefined value.
2.3.5
PowerPC VEA Instructions
The PowerPC virtual environment architecture (VEA) describes the semantics of the memory model that can be assumed by software processes, and includes descriptions of the cache model, cache control instructions, address aliasing, and other related issues. Implementations that conform to the VEA also adhere to the UISA, but may not necessarily adhere to the OEA. This section describes additional instructions that are provided by the VEA.
2.3.5.1
Processor Control Instructions--VEA
In addition to the move to condition register instructions (specified by the UISA), the VEA defines the mftb instruction (user-level instruction) for reading the contents of the time base register; see Chapter 3, "L1 Instruction and Data Cache Operation," for more information. Table 2-51 shows the mftb instruction.
Table 2-51. Move from Time Base Instruction
Name Move from Time Base Mnemonic mftb Syntax rD, TBR
Simplified mnemonics are provided for the mftb instruction so it can be coded with the TBR name as part of the mnemonic rather than requiring it to be coded as an operand. See
2-60 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Set Summary
Appendix F, "Simplified Mnemonics," in the Programming Environments Manual for simplified mnemonic examples and for simplified mnemonics for Move from Time Base (mftb) and Move from Time Base Upper (mftbu), which are variants of the mftb instruction rather than of mfspr. The mftb instruction serves as both a basic and simplified mnemonic. Assemblers recognize an mftb mnemonic with two operands as the basic form, and an mftb mnemonic with one operand as the simplified form. Note that the MPC750 ignores the extended opcode differences between mftb and mfspr by ignoring bit 25 and treating both instructions identically. Implementation Notes--The following information is useful with respect to using the time base implementation in the MPC750: * The MPC750 allows user-mode read access to the time base counter through the use of the Move from Time Base (mftb) and the Move from Time Base Upper (mftbu) instructions. As a 32-bit implementation, the MPC750 can access TBU and TBL only separately, whereas 64-bit implementations can access the entire TB register at once. The time base counter is clocked at a frequency that is one-fourth that of the bus clock. Counting is enabled by assertion of the time base enable (TBE) input signal.
*
2.3.5.2
Memory Synchronization Instructions--VEA
Memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events, and the order in which memory operations are seen by other processors or memory access mechanisms. See Chapter 3, "L1 Instruction and Data Cache Operation," for more information about these instructions and about related aspects of memory synchronization. In addition to the sync instruction (specified by UISA), the VEA defines the Enforce In-Order Execution of I/O (eieio) and Instruction Synchronize (isync) instructions. The number of cycles required to complete an eieio instruction depends on system parameters and on the processor's state when the instruction is issued. As a result, frequent use of this instruction may degrade performance slightly. Table 2-52 describes the memory synchronization instructions defined by the VEA.
MOTOROLA
Chapter 2. Programming Model
2-61
Instruction Set Summary
Table 2-52. Memory Synchronization Instructions--VEA
Name Enforce In-Order Execution of I/O Mnemonic Syntax eieio -- Implementation Notes The eieio instruction is dispatched to the LSU and executes after all previous cache-inhibited or write-through accesses are performed; all subsequent instructions that generate such accesses execute after eieio. If HID0[ABE] = 1 an EIEIO operation is broadcast on the external bus to enforce ordering in the external memory system. The eieio operation bypasses the L2 cache and is forwarded to the bus unit. If HID0[ABE] = 0, the operation is not broadcast. Because the MPC750 does not reorder noncacheable accesses, eieio is not needed to force ordering. However, if store gathering is enabled and an eieio is detected in a store queue, stores are not gathered. If HID0[ABE] = 1, broadcasting eieio prevents external devices, such as a bus bridge chip, from gathering stores. The isync instruction is refetch serializing; that is, it causes the MPC750 to purge its instruction queue and wait for all prior instructions to complete before refetching the next instruction, which is not executed until all previous instructions complete to the point where they cannot cause an exception. The isync instruction does not wait for all pending stores in the store queue to complete. Any instruction after an isync sees all effects of prior instructions.
Instruction Synchronize
isync
--
2.3.5.3
* * *
Memory Control Instructions--VEA
Memory control instructions can be classified as follows: Cache management instructions (user-level and supervisor-level) Segment register manipulation instructions (OEA) Translation lookaside buffer management instructions (OEA)
This section describes the user-level cache management instructions defined by the VEA. See Section 2.3.6.3, "Memory Control Instructions--OEA," for information about supervisor-level cache, segment register manipulation, and translation lookaside buffer management instructions. 2.3.5.3.1 User-Level Cache Instructions--VEA
The instructions summarized in this section help user-level programs manage on-chip caches if they are implemented. See Chapter 3, "L1 Instruction and Data Cache Operation," for more information about cache topics. The following sections describe how these operations are treated with respect to the MPC750's cache. As with other memory-related instructions, the effects of cache management instructions on memory are weakly-ordered. If the programmer must ensure that cache or other instructions have been performed with respect to all other processors and system mechanisms, a sync instruction must be placed after those instructions. Note that the MPC750 interprets cache control instructions (icbi, dcbi, dcbf, dcbz, and dcbst) as if they pertain only to the local L1 and L2 cache. A dcbz (with M set) is always broadcast on the 60x bus. The dcbi, dcbf, and dcbst operations are broadcast if HID0[ABE] is set.
2-62 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Set Summary
The MPC750 never broadcasts an icbi. Of the broadcast cache operations, the MPC750 snoops only dcbz, regardless of the HID0[ABE] setting. Any bus activity caused by other cache instructions results directly from performing the operation on the MPC750 cache. All cache control instructions to T = 1 space are no-ops. For information how cache control instructions affect the L2, see Chapter 9, "L2 Cache Interface Operation." Table 2-53 summarizes the cache instructions defined by the VEA. Note that these instructions are accessible to user-level programs.
Table 2-53. User-Level Cache Instructions
Name Data Cache Block Touch 1 Mnemonic Syntax dcbt rA,rB Implementation Notes The VEA defines this instruction to allow for potential system performance enhancements through the use of software-initiated prefetch hints. Implementations are not required to take any action based on execution of this instruction, but they may prefetch the cache block corresponding to the EA into their cache. When dcbt executes, the MPC750 checks for protection violations (as for a load instruction). This instruction is treated as a no-op for the following cases: * *A valid translation is not found either in BAT or TLB * *The access causes a protection violation. * * The page is mapped cache-inhibited, G = 1 (guarded), or T = 1. * * The cache is locked or disabled * * HID0[NOOPTI] = 1 Otherwise, if no data is in the cache location, the MPC750 requests a cache line fill (with intent to modify). Data brought into the cache is validated as if it were a load instruction. The memory reference of a dcbt sets the reference bit. This instruction behaves like dcbt. The EA is computed, translated, and checked for protection violations. For cache hits, four beats of zeros are written to the cache block and the tag is marked M. For cache misses with the replacement block marked E, the zero line fill is performed and the cache block is marked M. However, if the replacement block is marked M, the contents are written back to memory first. The instruction executes regardless of whether the cache is locked; if the cache is disabled, an alignment exception occurs. If M = 1 (coherency enforced), the address is broadcast to the bus before the zero line fill. The exception priorities (from highest to lowest) are as follows: 1 Cache disabled--Alignment exception 2 Page marked write-through or cache Inhibited--Alignment exception 3 BAT protection violation--DSI exception 4 TLB protection violation--DSI exception dcbz is the only cache instruction that broadcasts even if HID0[ABE] = 0.
Data Cache Block Touch for Store 1 Data Cache Block Set to Zero
dcbtst dcbz
rA,rB rA,rB
MOTOROLA
Chapter 2. Programming Model
2-63
Instruction Set Summary
Table 2-53. User-Level Cache Instructions (continued)
Name Data Cache Block Store Mnemonic Syntax dcbst rA,rB Implementation Notes The EA is computed, translated, and checked for protection violations. * *For cache hits with the tag marked E, no further action is taken. * *For cache hits with the tag marked M, the cache block is written back to memory and marked E. A dcbst is not broadcast unless HID0[ABE] = 1 regardless of WIMG settings. The instruction acts like a load with respect to address translation and memory protection. It executes regardless of whether the cache is disabled or locked. The exception priorities (from highest to lowest) for dcbst are as follows: * 1BAT protection violation--DSI exception * 2TLB protection violation--DSI exception The EA is computed, translated, and checked for protection violations. * *For cache hits with the tag marked M, the cache block is written back to memory and the cache entry is invalidated. * *For cache hits with the tag marked E, the entry is invalidated. * *For cache misses, no further action is taken. A dcbf is not broadcast unless HID0[ABE] = 1 regardless of WIMG settings. The instruction acts like a load with respect to address translation and memory protection. It executes regardless of whether the cache is disabled or locked. The exception priorities (from highest to lowest) for dcbf are as follows: * 1BAT protection violation--DSI exception * 2TLB protection violation--DSI exception This instruction performs a virtual lookup into the instruction cache (index only). The address is not translated, so it cannot cause an exception. All ways of a selected set are invalidated regardless of whether the cache is disabled or locked. The MPC750 never broadcasts icbi onto the 60x bus.
Data Cache Block Flush
dcbf
rA,rB
Instruction Cache Block Invalidate
icbi
rA,rB
Note:
1
A program that uses dcbt and dcbtst instructions improperly performs less efficiently. To improve performance, HID0[NOOPTI] may be set, which causes dcbt and dcbtst to be no-oped at the cache. They do not cause bus activity and cause only a 1-clock execution latency. The default state of this bit is zero which enables the use of these instructions.
2.3.5.4
Optional External Control Instructions
The PowerPC architecture defines an optional external control feature that, if implemented, is supported by the two external control instructions, eciwx and ecowx. These instructions allow a user-level program to communicate with a special-purpose device. These instructions are provided and are summarized in Table 2-54.
2-64
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
Table 2-54. External Control Instructions
Name External Control In Word Indexed External Control Out Word Indexed Mnemonic eciwx Syntax Implementation Notes
ecowx
rD,rA,rB A transfer size of 4 bytes is implied; the TBST and TSIZ[0-2] signals are redefined to specify the Resource ID (RID), copied from bits EAR[28-31]. For these operations, TBST carries the EAR[28] data. Misaligned operands for these instructions cause an alignment exception. Addressing a location where rS,rA,rB SR[T] = 1 causes a DSI exception. If MSR[DR] = 0 a programming error occurs and the physical address on the bus is undefined. Note: These instructions are optional to the PowerPC architecture.
The eciwx/ecowx instructions let a system designer map special devices in an alternative way. The MMU translation of the EA is not used to select the special device, as it is used in most instructions such as loads and stores. Rather, it is used as an address operand that is passed to the device over the address bus. Four other signals (the burst and size signals on the 60x bus) are used to select the device; these four signals output the 4-bit resource ID (RID) field located in the EAR. The eciwx instruction also loads a word from the data bus that is output by the special device. For more information about the relationship between these instructions and the system interface, refer to Chapter 7, "Signal Descriptions."
2.3.6
PowerPC OEA Instructions
The PowerPC operating environment architecture (OEA) includes the structure of the memory management model, supervisor-level registers, and the exception model. Implementations that conform to the OEA also adhere to the UISA and the VEA. This section describes the instructions provided by the OEA.
2.3.6.1
System Linkage Instructions--OEA
This section describes the system linkage instructions (see Table 2-55). The user-level sc instruction lets a user program call on the system to perform a service and causes the processor to take a system call exception. The supervisor-level rfi instruction is used for returning from an exception handler.
Table 2-55. System Linkage Instructions--OEA
Name System Call Return from Interrupt Mnemonic sc rfi Syntax -- -- Implementation Notes The sc instruction is context-synchronizing. The rfi instruction is context-synchronizing. For the MPC750, this means the rfi instruction works its way to the final stage of the execution pipeline, updates architected registers, and redirects the instruction flow.
2.3.6.2
Processor Control Instructions--OEA
This section describes the processor control instructions used to access the MSR and the SPRs. Table 2-56 lists instructions for accessing the MSR.
MOTOROLA
Chapter 2. Programming Model
2-65
Instruction Set Summary
Table 2-56. Move to/from Machine State Register Instructions
Name Move to Machine State Register Move from Machine State Register Mnemonic mtmsr mfmsr Syntax rS rD
The OEA defines encodings of mtspr and mfspr to provide access to supervisor-level registers. The instructions are listed in Table 2-57.
Table 2-57. Move to/from Special-Purpose Register Instructions (OEA)
Name Move to Special-Purpose Register Move from Special-Purpose Register Mnemonic mtspr mfspr Syntax SPR,rS rD,SPR
Encodings for the architecture-defined SPRs are listed in Table 2-48. Encodings for MPC750-specific, supervisor-level SPRs are listed in Table 2-49. Simplified mnemonics are provided for mtspr and mfspr in Appendix F, "Simplified Mnemonics," in the Programming Environments Manual. For a discussion of context synchronization requirements when altering certain SPRs, refer to Appendix E, "Synchronization Programming Examples," in the Programming Environments Manual.
2.3.6.3
* * *
Memory Control Instructions--OEA
Memory control instructions include the following: Cache management instructions (supervisor-level and user-level) Segment register manipulation instructions Translation lookaside buffer management instructions
This section describes supervisor-level memory control instructions. Section 2.3.5.3, "Memory Control Instructions--VEA," describes user-level memory control instructions. 2.3.6.3.1 Supervisor-Level Cache Management Instruction--(OEA)
Table 2-58. Supervisor-Level Cache Management Instruction
Name Data Cache Block Invalidate Mnemonic Syntax dcbi rA,rB Implementation Notes The EA is computed, translated, and checked for protection violations. For cache hits, the cache block is marked I regardless of whether it was marked E or M. A dcbi is not broadcast unless HID0[ABE] = 1, regardless of WIMG settings. The instruction acts like a store with respect to address translation and memory protection. It executes regardless of whether the cache is disabled or locked. The exception priorities (from highest to lowest) for dcbi are as follows: 1 BAT protection violation--DSI exception 2 TLB protection violation--DSI exception
Table 2-58 lists the only supervisor-level cache management instruction.
2-66
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Summary
See Section 2.3.5.3.1, "User-Level Cache Instructions--VEA," for cache instructions that provide user-level programs the ability to manage the on-chip caches. If the effective address references a direct-store segment, the instruction is treated as a no-op. 2.3.6.3.2 Segment Register Manipulation Instructions (OEA)
The instructions listed in Table 2-59 provide access to the segment registers for 32-bit implementations. These instructions operate completely independently of the MSR[IR] and MSR[DR] bit settings. Refer to "Synchronization Requirements for Special Registers and for Lookaside Buffers," in Chapter 2, "PowerPC Register Set," of the Programming Environments Manual for serialization requirements and other recommended precautions to observe when manipulating the segment registers.
Table 2-59. Segment Register Manipulation Instructions
Name Move to Segment Register Move to Segment Register Indirect Move from Segment Register Move from Segment Register Indirect Notes:
1
Mnemonic mtsr 1 mtsrin mfsr mfsrin
1
Syntax SR,rS rS,rB rD,SR rD,rB -- --
Implementation Notes
The shadow SRs in the instruction MMU can be read by setting HID0[RISEG] before executing mfsr. --
The MPC750 and MPC755 have a restriction on the use of the mtsr and mtsrin instructions not described in the Programming Environments Manual.The MPC750 and MPC755 require that an isync instruction be executed after either an mtsr or mtsrin instruction. This isync instruction must occur after the execution of the mtsr or mtsrin and before the data address translation mechanism uses any of the on-chip segment registers.
2.3.6.3.3
Translation Lookaside Buffer Management Instructions--(OEA)
The address translation mechanism is defined in terms of the segment descriptors and page table entries (PTEs) that the PowerPC architecture defines for locating the logical-to-physical address mapping for a particular access. These segment descriptors and PTEs reside in segment registers and page tables in memory, respectively. See Chapter 7, "Memory Management," for more information about TLB operations. Table 2-60 summarizes the operation of the TLB instructions in the MPC750.
Table 2-60. Translation Lookaside Buffer Management Instruction
Name TLB Invalidate Entry TLB Synchronize Mnemonic Syntax tlbie rB Implementation Notes Invalidates both ways in both instruction and data TLB entries at the index provided by EA[14-19]. It executes regardless of the MSR[DR] and MSR[IR] settings.To invalidate all entries in both TLBs, the programmer should issue 64 tlbie instructions that each successively increment this field. On the MPC750, the only function tlbsync serves is to wait for the TLBISYNC signal to go inactive.
tlbsync
--
MOTOROLA
Chapter 2. Programming Model
2-67
Instruction Set Summary
Implementation Note--The tlbia instruction is optional for an implementation if its effects can be achieved through some other mechanism. Therefore, it is not implemented on the MPC750. As described above, tlbie can be used to invalidate a particular index of the TLB based on EA[14-19]--a sequence of 64 tlbie instructions followed by a tlbsync instruction invalidates all the TLB structures (for EA[14-19] = 0, 1, 2,..., 63). Attempting to execute tlbia causes an illegal instruction program exception. The presence and exact semantics of the TLB management instructions are implementation-dependent. To minimize compatibility problems, system software should incorporate uses of these instructions into subroutines.
2.3.7
Recommended Simplified Mnemonics
To simplify assembly language coding, a set of alternative mnemonics is provided for some frequently used operations (such as no-op, load immediate, load address, move register, and complement register). Programs written to be portable across the various assemblers for the PowerPC architecture should not assume the existence of mnemonics not described in this document. For a complete list of simplified mnemonics, see Appendix F, "Simplified Mnemonics," in the Programming Environments Manual.
2-68
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 3 L1 Instruction and Data Cache Operation
This chapter describes the on-chip instruction and data caches of the MPC750. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor." The MPC750 microprocessor contains separate 32-Kbyte, eight-way set associative instruction and data caches to allow the execution units and registers rapid access to instructions and data. This chapter describes the organization of the on-chip instruction and data caches, the MEI cache coherency protocol, cache control instructions, various cache operations, and the interaction between the caches, the load/store unit (LSU), the instruction unit, and the bus interface unit (BIU). Note that in this chapter, the term `multiprocessor' is used in the context of maintaining cache coherency. These multiprocessor devices could be actual processors or other devices that can access system memory, maintain their own caches, and function as bus masters requiring cache coherency. The MPC750 cache implementation has the following characteristics: * * * * * * There are two separate 32-Kbyte instruction and data caches (Harvard architecture). Both instruction and data caches are eight-way set associative. The caches implement a pseudo least-recently-used (PLRU) replacement algorithm within each set. The cache directories are physically addressed. The physical (real) address tag is stored in the cache directory. Both the instruction and data caches have 32-byte cache blocks. A cache block is the block of memory that a coherency state describes, also referred to as a cache line. Two coherency state bits for each data cache block allow encoding for three states: -- Modified (Exclusive) (M) -- Exclusive (Unmodified) (E) -- Invalid (I) A single coherency state bit for each instruction cache block allows encoding for two possible states:
*
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-1
*
-- Invalid (INV) -- Valid (VAL) Each cache can be invalidated or locked by setting the appropriate bits in the hardware implementation-dependent register 0 (HID0), a special-purpose register (SPR) specific to the MPC750.
The MPC750 supports a fully-coherent 4-Gbyte physical memory address space. Bus snooping is used to drive the MEI three-state cache coherency protocol that ensures the coherency of global memory with respect to the processor's data cache. The MEI protocol is described in Section 3.3.2, "MEI Protocol." On a cache miss, the MPC750's cache blocks are filled in four beats of 64 bits each. The burst fill is performed as a critical-double-word-first operation; the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to cache fill latency. The instruction and data caches are integrated into the MPC750 as shown in Figure 3-1.
Instruction Unit Load/Store Unit (LSU)
Instructions (0-127)
EA (20-26)
Data (0-63)
Cache Tags I-Cache 32-Kbyte 8-Way Set Associative PA (0-19) Cache Logic
Cache Tags D-Cache 32-Kbyte 8-Way Set Associative Cache Logic
Instructions (0-63)
PA (0-31) MMU/L2 BIU (MPC750 only)/60x BIU
Data (0-63)
EA: Effective Address PA: Physical Address
Figure 3-1. Cache Integration
Both caches are tightly coupled to the MPC750's bus interface unit to allow efficient access to the system memory controller and other bus masters. The bus interface unit receives requests for bus operations from the instruction and data caches, and executes the operations per the 60x bus protocol. The BIU provides address queues, prioritizing logic, and bus control logic. The BIU captures snoop addresses for data cache, address queue, and memory reservation (lwarx and stwcx. instruction) operations. The data cache provides buffers for load and store bus operations. All the data for the corresponding address queues (load and store data queues) is located in the data cache. The data queues are considered temporary storage for the cache and not part of the BIU. The
3-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Data Cache Organization
data cache also provides storage for the cache tags required for memory coherency and performs the cache block replacement PLRU function. The data cache supplies data to the GPRs and FPRs by means of the load/store unit. The MPC750's LSU is directly coupled to the data cache to allow efficient movement of data to and from the general-purpose and floating-point registers. The load/store unit provides all logic required to calculate effective addresses, handles data alignment to and from the data cache, and provides sequencing for load and store string and multiple operations. Write operations to the data cache can be performed on a byte, half-word, word, or double-word basis. The instruction cache provides a 128-bit interface to the instruction unit, so four instructions can be made available to the instruction unit in a single clock cycle. The instruction unit accesses the instruction cache frequently in order to sustain the high throughput provided by the six-entry instruction queue.
3.1
Data Cache Organization
The data cache is organized as 128 sets of eight blocks as shown in Figure 3-2. Each block consists of 32 bytes, two state bits, and an address tag. Note that in the PowerPC architecture, the term `cache block,' or simply `block,' when used in the context of cache implementations, refers to the unit of memory at which coherency is maintained. For the MPC750, this is the eight-word cache line. This value may be different for other processors in the family. Each cache block contains eight contiguous words from memory that are loaded from an eight-word boundary (that is, bits A[27-31] of the logical (effective) addresses are zero); as a result, cache blocks are aligned with page boundaries. Note that address bits A[20-26] provide the index to select a cache set. Bits A[27-31] select a byte within a block. The two state bits implement a three-state MEI (modified/exclusive/invalid) protocol, a coherent subset of the standard four-state MESI (modified/exclusive/shared/invalid) protocol. The MEI protocol is described in Section 3.3.2, "MEI Protocol." The tags consist of bits PA[0-19]. Address translation occurs in parallel with set selection (from A[20-26]), and the higher-order address bits (the tag bits in the cache) are physical. The MPC750's on-chip data cache tags are single-ported, and load or store operations must be arbitrated with snoop accesses to the data cache tags. Load or store operations can be performed to the cache on the clock cycle immediately following a snoop access if the snoop misses; snoop hits may block the data cache for two or more cycles, depending on whether a copy-back to main memory is required.
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-3
Instruction Cache Organization
128 Sets
Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7
Address Tag 0 Address Tag 1 Address Tag 2 Address Tag 3 Address Tag 4 Address Tag 5 Address Tag 6 Address Tag 7
State State State State State State State State
Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] 8 Words/Block
Figure 3-2. Data Cache Organization
3.2
Instruction Cache Organization
The instruction cache also consists of 128 sets of eight blocks, as shown in Figure 3-3. Each block consists of 32 bytes, a single state bit, and an address tag. As with the data cache, each instruction cache block contains eight contiguous words from memory that are loaded from an eight-word boundary (that is, bits A[27-31] of the logical (effective) addresses are zero); as a result, cache blocks are aligned with page boundaries. Also, address bits A[20-26] provide the index to select a set, and bits A[27-29] select a word within a block. The tags consist of bits PA[0-19]. Address translation occurs in parallel with set selection (from A[20-26]), and the higher order address bits (the tag bits in the cache) are physical. The instruction cache differs from the data cache in that it does not implement MEI cache coherency protocol, and a single state bit is implemented that indicates only whether a cache block is valid or invalid. The instruction cache is not snooped, so if a processor modifies a memory location that may be contained in the instruction cache, software must ensure that such memory updates are visible to the instruction fetching mechanism. This can be achieved with the following instruction sequence:
dcbst sync icbi sync 3-4 # update memory # wait for update # remove (invalidate) copy in instruction cache # wait for ICBI operation to be globally performed MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Memory and Cache Coherency isync # remove copy in own instruction buffer
These operations are necessary because the processor does not maintain instruction memory coherent with data memory. Software is responsible for enforcing coherency of instruction caches and data memory. Since instruction fetching may bypass the data cache, changes made to items in the data cache may not be reflected in memory until after the instruction fetch completes.
128 Sets
Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7
Address Tag 0 Address Tag 1 Address Tag 2 Address Tag 3 Address Tag 4 Address Tag 5 Address Tag 6 Address Tag 7
Valid Valid Valid Valid Valid Valid Valid Valid
Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] Words [0-7] 8 Words/Block
Figure 3-3. Instruction Cache Organization
3.3
Memory and Cache Coherency
The primary objective of a coherent memory system is to provide the same image of memory to all devices using the system. Coherency allows synchronization and cooperative use of shared resources. Otherwise, multiple copies of a memory location, some containing stale values, could exist in a system resulting in errors when the stale values are used. Each potential bus master must follow rules for managing the state of its cache. This section describes the coherency mechanisms of the PowerPC architecture and the three-state cache coherency protocol of the MPC750 data cache. Note that unless specifically noted, the discussion of coherency in this section applies to the MPC750's data cache only. The instruction cache is not snooped. Instruction cache coherency must be maintained by software. However, the MPC750 does support a fast instruction cache invalidate capability as described in Section 3.4.1.4, "Instruction Cache Flash Invalidation."
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-5
Memory and Cache Coherency
3.3.1
Memory/Cache Access Attributes (WIMG Bits)
Some memory characteristics can be set on either a block or page basis by using the WIMG bits in the BAT registers or page table entry (PTE), respectively. The WIMG attributes control the following functionality: * * * * Write-through (W bit) Caching-inhibited (I bit) Memory coherency (M bit) Guarded memory (G bit)
These bits allow both uniprocessor and multiprocessor system designs to exploit numerous system-level performance optimizations. The WIMG attributes are programmed by the operating system for each page and block. The W and I attributes control how the processor performing an access uses its own cache. The M attribute ensures that coherency is maintained for all copies of the addressed memory location. The G attribute prevents out-of-order loading and prefetching from the addressed memory location. The WIMG attributes occupy four bits in the BAT registers for block address translation and in the PTEs for page address translation. The WIMG bits are programmed as follows: * The operating system uses the mtspr instruction to program the WIMG bits in the BAT registers for block address translation. The IBAT register pairs do not have a G bit and all accesses that use the IBAT register pairs are considered not guarded. The operating system writes the WIMG bits for each page into the PTEs in system memory as it sets up the page tables.
*
When an access requires coherency, the processor performing the access must inform the coherency mechanisms throughout the system that the access requires memory coherency. The M attribute determines the kind of access performed on the bus (global or local). Software must exercise care with respect to the use of these bits if coherent memory support is desired. Careless specification of these bits may create situations that present coherency paradoxes to the processor. In particular, this can happen when the state of these bits is changed without appropriate precautions (such as flushing the pages that correspond to the changed bits from the caches of all processors in the system) or when the address translations of aliased real addresses specify different values for any of the WIMG bits. These coherency paradoxes can occur within a single processor or across several processors. It is important to note that in the presence of a paradox, the operating system software is responsible for correctness. For real addressing mode (that is, for accesses performed with address translation disabled--MSR[IR] = 0 or MSR[DR] = 0 for instruction or data access, respectively), the WIMG bits are automatically generated as 0b0011 (the data is write-back, caching is enabled, memory coherency is enforced, and memory is guarded).
3-6 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Memory and Cache Coherency
3.3.2
MEI Protocol
The MPC750 data cache coherency protocol is a coherent subset of the standard MESI four-state cache protocol that omits the shared state. The MPC750's data cache characterizes each 32-byte block it contains as being in one of three MEI states. Addresses presented to the cache are indexed into the cache directory with bits A[20-26], and the upper-order 20 bits from the physical address translation (PA[0-19]) are compared against the indexed cache directory tags. If neither of the indexed tags matches, the result is a cache miss. If a tag matches, a cache hit occurred and the directory indicates the state of the cache block through two state bits kept with the tag. The three possible states for a cache block in the cache are the modified state (M), the exclusive state (E), and the invalid state (I). The three MEI states are defined in Table 3-1.
Table 3-1. MEI State Definitions
MEI State Modified (M) Definition The addressed cache block is present in the cache, and is modified with respect to system memory--that is, the modified data in the cache block has not been written back to memory. The cache block may be present in the MPC750's L2 cache, but it is not present in any other coherent cache. The addressed cache block is present in the cache, and this cache has exclusive ownership of the addressed block. The addressed block may be present in the MPC750's L2 cache, but it is not present in any other processor's cache. The data in this cache block is consistent with system memory. This state indicates that the address block does not contain valid data or that the addressed cache block is not resident in the cache.
Exclusive (E)
Invalid (I)
The MPC750 provides dedicated hardware to provide memory coherency by snooping bus transactions. Figure 3-4 shows the MEI cache coherency protocol, as enforced by the MPC750. Figure 3-4 assumes that the WIM bits for the page or block are set to 001; that is, write-back, caching-not-inhibited, and memory coherency enforced. Because data cannot be shared, the MPC750 signals all cache block fills as if they were write misses (read-with-intent-to-modify), which flushes the corresponding copies of the data in all caches external to the MPC750 prior to the cache-block-fill operation. Following the cache block load, the MPC750 is the exclusive owner of the data and may write to it without a bus broadcast transaction.
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-7
Memory and Cache Coherency
Invalid
SH/CRW WM
SH/CRW RM
RH
Modified
WH SH
Exclusive
RH
WH Bus Transactions SH = Snoop Hit = Snoop Push RH = Read Hit RM = Read Miss WH = Write Hit = Cache Block Fill WM = Write Miss SH/CRW = Snoop Hit, Cacheable Read/Write
SH/CIR
Figure 3-4. MEI Cache Coherency Protocol--State Diagram (WIM = 001)
To maintain the three-state coherency, all global reads observed on the bus by the MPC750 are snooped as if they were writes, causing the MPC750 to flush the cache block (write the cache block back to memory and invalidate the cache block if it is modified, or simply invalidate the cache block if it is unmodified). The exception to this rule occurs when a snooped transaction is a caching-inhibited read (either burst or single-beat, where TT[0-4] = X1010; see Table 7-1 for clarification), in which case the MPC750 does not invalidate the snooped cache block. If the cache block is modified, the block is written back to memory, and the cache block is marked exclusive. If the cache block is marked exclusive, no bus action is taken, and the cache block remains in the exclusive state. This treatment of caching-inhibited reads decreases the possibility of data thrashing by allowing noncaching devices to read data without invalidating the entry from the MPC750's data cache. Section 3.8, "MEI State Transactions," provides a detailed list of MEI transitions for various operations and WIM bit settings.
3-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory and Cache Coherency
3.3.2.1
MEI Hardware Considerations
While the MPC750 provides the hardware required to monitor bus traffic for coherency, the MPC750 data cache tags are single-ported, and a simultaneous load/store and snoop access represents a resource conflict. In general, the snoop access has highest priority and is given first access to the tags. The load or store access will then occur on the clock following the snoop. The snoop is not given priority into the tags when the snoop coincides with a tag write (for example, validation after a cache block load). In these situations, the snoop is retried and must re-arbitrate before the lookup is possible. Occasionally, cache snoops cannot be serviced and must be retried. These retries occur if the cache is busy with a burst read or write when the snoop operation takes place. Note that it is possible for a snoop to hit a modified cache block that is already in the process of being written to the copy-back buffer for replacement purposes. If this happens, the MPC750 retries the snoop, and raises the priority of the castout operation to allow it to go to the bus before the cache block fill. Another consideration is page table aliasing. If a store hits to a modified cache block but the page table entry is marked write-through (WIMG = 1xxx), then the page has probably been aliased through another page table entry which is marked write-back (WIMG = 0xxx). If this occurs, the MPC750 ignores the modified bit in the cache tag. The cache block is updated during the write-through operation and the block remains in the modified state. The global (GBL) signal, asserted as part of the address attribute field during a bus transaction, enables the snooping hardware of the MPC750. Address bus masters assert GBL to indicate that the current transaction is a global access (that is, an access to memory shared by more than one device). If GBL is not asserted for the transaction, that transaction is not snooped by the MPC750. Note that the GBL signal is not asserted for instruction fetches, and that GBL is asserted for all data read or write operations when using real addressing mode (that is, address translation is disabled). Normally, GBL reflects the M-bit value specified for the memory reference in the corresponding translation descriptor(s). Care should be taken to minimize the number of pages marked as global, because the retry protocol enforces coherency and can use considerable bus bandwidth if much data is shared. Therefore, available bus bandwidth decreases as more memory is marked as global. The MPC750 snoops a transaction if the transfer start (TS) and GBL signals are asserted together in the same bus clock (this is a qualified snooping condition). No snoop update to the MPC750 cache occurs if the snooped transaction is not marked global. Also, because cache block castouts and snoop pushes do not require snooping, the GBL signal is not asserted for these operations. When the MPC750 detects a qualified snoop condition, the address associated with the TS signal is compared with the cache tags. Snooping finishes if no hit is detected. If, however,
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-9
Memory and Cache Coherency
the address hits in the cache, the MPC750 reacts according to the MEI protocol shown in Figure 3-4.
3.3.3
*
Coherency Precautions in Single Processor Systems
Load or store to a caching-inhibited page (WIMG = x1xx) and a cache hit occurs. The MPC750 ignores any hits to a cache block in a memory space marked caching-inhibited (WIMG = x1xx). The access is performed on the external bus as if there were no hit. The data in the cache is not pushed, and the cache block is not invalidated. Store to a page marked write-through (WIMG = 1xxx) and a cache hit occurs to a modified cache block. The MPC750 ignores the modified bit in the cache tag. The cache block is updated during the write-through operation but the block remains in the modified state (M).
The following coherency paradoxes can be encountered within a single-processor system:
*
Note that when WIM bits are changed in the page tables or BAT registers, it is critical that the cache contents reflect the new WIM bit settings. For example, if a block or page that had allowed caching becomes caching-inhibited, software should ensure that the appropriate cache blocks are flushed to memory and invalidated.
3.3.4
Coherency Precautions in Multiprocessor Systems
The MPC750's three-state coherency protocol permits no data sharing between the MPC750 and other caches. All burst reads initiated by the MPC750 are performed as read with intent to modify. Burst snoops are interpreted as read with intent to modify or read with no intent to cache. This effectively places all caches in the system into a three-state coherency scheme. Four-state caches may share data amongst themselves but not with the MPC750.
3.3.5
MPC750-Initiated Load/Store Operations
Load and store operations are assumed to be weakly ordered on the MPC750. The load/store unit (LSU) can perform load operations that occur later in the program ahead of store operations, even when the data cache is disabled (see Section 3.3.5.2, "Sequential Consistency of Memory Accesses). However, strongly ordered load and store operations can be enforced through the setting of the I bit (of the page WIMG bits) when address translation is enabled. Note that when address translation is disabled (real addressing mode), the default WIMG bits cause the I bit to be cleared (accesses are assumed to be cacheable), and thus the accesses are weakly ordered. Refer to Section 5.2, "Real Addressing Mode," for a description of the WIMG bits when address translation is disabled.
3-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory and Cache Coherency
The MPC750 does not provide support for direct-store segments. Operations attempting to access a direct-store segment will invoke a DSI exception. For additional information about DSI exceptions, refer to Section 4.5.3, "DSI Exception (0x00300)."
3.3.5.1
Performed Loads and Stores
The PowerPC architecture defines a performed load operation as one that has the addressed memory location bound to the target register of the load instruction. The architecture defines a performed store operation as one where the stored value is the value that any other processor will receive when executing a load operation (that is of course, until it is changed again). With respect to the MPC750, caching-allowed (WIMG = x0xx) loads and caching-allowed, write-back (WIMG = 00xx) stores are performed when they have arbitrated to address the cache block. Note that in the event of a cache miss, these storage operations may place a memory request into the processor's memory queue, but such operations are considered an extension to the state of the cache with respect to snooping bus operations. Caching-inhibited (WIMG = x1xx) loads, caching-inhibited (WIMG = x1xx) stores, and write-through (WIMG = 1xxx) stores are performed when they have been successfully presented to the external 60x bus.
3.3.5.2
Sequential Consistency of Memory Accesses
The PowerPC architecture requires that all memory operations executed by a single processor be sequentially consistent with respect to that processor. This means that all memory accesses appear to be executed in program order with respect to exceptions and data dependencies. The MPC750 achieves sequential consistency by operating a single pipeline to the cache/MMU. All memory accesses are presented to the MMU in exact program order and therefore exceptions are determined in order. Loads are allowed to bypass stores once exception checking has been performed for the store, but data dependency checking is handled in the load/store unit so that a load will not bypass a store with an address match. Note that although memory accesses that miss in the cache are forwarded to the memory queue for future arbitration for the external bus, all potential synchronous exceptions have been resolved before the cache. In addition, although subsequent memory accesses can address the cache, full coherency checking between the cache and the memory queue is provided to avoid dependency conflicts.
3.3.5.3
Atomic Memory References
The PowerPC architecture defines the Load Word and Reserve Indexed (lwarx) and the Store Word Conditional Indexed (stwcx.) instructions to provide an atomic update function for a single, aligned word of memory. These instructions can be used to develop a rich set of multiprocessor synchronization primitives. Note that atomic memory references constructed using lwarx/stwcx. instructions depend on the presence of a coherent memory system for correct operation. These instructions should not be expected to provide atomic
MOTOROLA Chapter 3. L1 Instruction and Data Cache Operation 3-11
Memory and Cache Coherency
access to noncoherent memory. For detailed information on these instructions, refer to Chapter 2, "Programming Model," in this book and Chapter 8, "Instruction Set," in the Programming Environments Manual. The lwarx instruction performs a load word from memory operation and creates a reservation for the 32-byte section of memory that contains the accessed word. The reservation granularity is 32 bytes. The lwarx instruction makes a nonspecific reservation with respect to the executing processor and a specific reservation with respect to other masters. This means that any subsequent stwcx. executed by the same processor, regardless of address, will cancel the reservation. Also, any bus write or invalidate operation from another processor to an address that matches the reservation address will cancel the reservation. The stwcx. instruction does not check the reservation for a matching address. The stwcx. instruction is only required to determine whether a reservation exists. The stwcx. instruction performs a store word operation only if the reservation exists. If the reservation has been cancelled for any reason, then the stwcx. instruction fails and clears the CR0[EQ] bit in the condition register. The architectural intent is to follow the lwarx/stwcx. instruction pair with a conditional branch which checks to see whether the stwcx. instruction failed. If the page table entry is marked caching-allowed (WIMG = x0xx), and an lwarx access misses in the cache, then the MPC750 performs a cache block fill. If the page is marked caching-inhibited (WIMG = x1xx) or the cache is locked, and the access misses, then the lwarx instruction appears on the bus as a single-beat load. All bus operations that are a direct result of either an lwarx instruction or an stwcx. instruction are placed on the bus with a special encoding. Note that this does not force all lwarx instructions to generate bus transactions, but rather provides a means for identifying when an lwarx instruction does generate a bus transaction. If an implementation requires that all lwarx instructions generate bus transactions, then the associated pages should be marked as caching-inhibited. The state of the reservation is always presented onto the RSRV output signal. This can be used to determine when an internal condition has caused a change in the reservation state. The MPC750's data cache treats all stwcx. operations as write-through independent of the WIMG settings. However, if the stwcx. operation hits in the MPC750's L2 cache, then the operation completes with the reservation intact in the L2 cache. See Chapter 9, "L2 Cache Interface Operation," for more information. Otherwise, the stwcx. operation continues to the bus interface unit for completion. When the write-through operation completes successfully, either in the L2 cache or on the 60x bus, then the data cache entry is updated (assuming it hits), and CR0[EQ] is modified to reflect the success of the operation. If the reservation is not intact, the stwcx. completes in the bus interface unit without performing a bus transaction, and without modifying either of the caches.
3-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Cache Control
3.4
Cache Control
The MPC750's L1 caches are controlled by programming specific bits in the HID0 special-purpose register and by issuing dedicated cache control instructions. Section 3.4.1, "Cache Control Parameters in HID0," describes the HID0 cache control bits, and Section 3.4.2, "Cache Control Instructions," describes the cache control instructions.
3.4.1
Cache Control Parameters in HID0
The HID0 special-purpose register contains several bits that invalidate, disable, and lock the instruction and data caches. The following sections describe these facilities.
3.4.1.1
Data Cache Flash Invalidation
The data cache is automatically invalidated when the MPC750 is powered up and during a hard reset. However, a soft reset does not automatically invalidate the data cache. Software must use the HID0 data cache flash invalidate bit (HID0[DCFI]) if data cache invalidation is desired after a soft reset. Once HID0[DCFI] is set through an mtspr operation, the MPC750 automatically clears this bit in the next clock cycle (provided that the data cache is enabled in the HID0 register). Note that some microprocessors that implement the PowerPC architecture, accomplish data cache flash invalidation by setting and clearing HID0[DCFI] with two consecutive mtspr instructions (that is, the bit is not automatically cleared by the microprocessor). Software that has this sequence of operations does not need to be changed to run on the MPC750.
3.4.1.2
Data Cache Enabling/Disabling
The data cache may be enabled or disabled by using the data cache enable bit, HID0[DCE]. HID0[DCE] is cleared on power-up, disabling the data cache. When the data cache is in the disabled state (HID0[DCE] = 0), the cache tag state bits are ignored, and all accesses are propagated to the L2 cache or 60x bus as single-beat transactions. Note that the CI (cache inhibit) signal always reflects the state of the caching-inhibited memory/cache access attribute (the I bit) independent of the state of HID0[DCE]. Also note that disabling the data cache does not affect the translation logic; translation for data accesses is controlled by MSR[DR]. The setting of the DCE bit must be preceded by a sync instruction to prevent the cache from being enabled or disabled in the middle of a data access. In addition, the cache must be globally flushed before it is disabled to prevent coherency problems when it is re-enabled. Snooping is not performed when the data cache is disabled. The dcbz instruction will cause an alignment exception when the data cache is disabled. The touch load (dcbt and dcbtst) instructions are no-ops when the data cache is disabled.
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-13
Cache Control
Other cache operations (caused by the dcbf, dcbst, and dcbi instructions) are not affected by disabling the cache. This can potentially cause coherency errors. For example, a dcbf instruction that hits a modified cache block in the disabled cache will cause a copyback to memory of potentially stale data.
3.4.1.3
Data Cache Locking
The contents of the data cache can be locked by setting the data cache lock bit, HID0[DLOCK]. A data access that hits in a locked data cache is serviced by the cache. However, all accesses that miss in the locked cache are propagated to the L2 cache or 60x bus as single-beat transactions. Note that the CI signal always reflects the state of the caching-inhibited memory/cache access attribute (the I bit) independent of the state of HID0[DLOCK]. The MPC750 treats snoop hits to a locked data cache the same as snoop hits to an unlocked data cache. However, any cache block invalidated by a snoop hit remains invalid until the cache is unlocked. The setting of the DLOCK bit must be preceded by a sync instruction to prevent the data cache from being locked during a data access.
3.4.1.4
Instruction Cache Flash Invalidation
The instruction cache is automatically invalidated when the MPC750 is powered up and during a hard reset. However, a soft reset does not automatically invalidate the instruction cache. Software must use the HID0 instruction cache flash invalidate bit (HID0[ICFI]) if instruction cache invalidation is desired after a soft reset. Once HID0[ICFI] is set through an mtspr operation, the MPC750 automatically clears this bit in the next clock cycle (provided that the instruction cache is enabled in the HID0 register). Note that some microprocessors that implement the PowerPC architecture, accomplish instruction cache flash invalidation by setting and clearing HID0[ICFI] with two consecutive mtspr instructions (that is, the bit is not automatically cleared by the microprocessor). Software that has this sequence of operations does not need to be changed to run on the MPC750.
3.4.1.5
Instruction Cache Enabling/Disabling
The instruction cache may be enabled or disabled through the use of the instruction cache enable bit, HID0[ICE]. HID0[ICE] is cleared on power-up, disabling the instruction cache. When the instruction cache is in the disabled state (HID[ICE] = 0), the cache tag state bits are ignored, and all instruction fetches are propagated to the L2 cache or 60x bus as single-beat transactions. Note that the CI signal always reflects the state of the caching-inhibited memory/cache access attribute (the I bit) independent of the state of
3-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Cache Control
HID0[ICE]. Also note that disabling the instruction cache does not affect the translation logic; translation for instruction accesses is controlled by MSR[IR]. The setting of the ICE bit must be preceded by an isync instruction to prevent the cache from being enabled or disabled in the middle of an instruction fetch. In addition, the cache must be globally flushed before it is disabled to prevent coherency problems when it is re-enabled. The icbi instruction is not affected by disabling the instruction cache.
3.4.1.6
Instruction Cache Locking
The contents of the instruction cache can be locked by setting the instruction cache lock bit, HID0[ILOCK]. An instruction fetch that hits in a locked instruction cache is serviced by the cache. However, all accesses that miss in the locked cache are propagated to the L2 cache or 60x bus as single-beat transactions. Note that the CI signal always reflects the state of the caching-inhibited memory/cache access attribute (the I bit) independent of the state of HID0[ILOCK]. The setting of the ILOCK bit must be preceded by an isync instruction to prevent the instruction cache from being locked during an instruction fetch.
3.4.2
Cache Control Instructions
The PowerPC architecture defines instructions for controlling both the instruction and data caches (when they exist). The cache control instructions, dcbt, dcbtst, dcbz, dcbst, dcbf, dcbi, and icbi, are intended for the management of the local L1 and L2 caches. The MPC750 interprets the cache control instructions as if they pertain only to its own L1 or L2 caches. These instructions are not intended for managing other caches in the system (except to the extent necessary to maintain coherency). The MPC750 does not snoop cache control instruction broadcasts, except for dcbz when M = 1. The dcbz instruction is the only cache control instruction that causes a broadcast on the 60x bus (when M = 1) to maintain coherency. All other data cache control instructions (dcbi, dcbf, dcbst and dcbz) are not broadcast, unless broadcast is enabled through the HID0[ABE] configuration bit. Note that dcbi, dcbf, dcbst and dcbz do broadcast to the MPC750's L2 cache, regardless of HID0[ABE]. The icbi instruction is never broadcast.
3.4.2.1
Data Cache Block Touch (dcbt) and Data Cache Block Touch for Store (dcbtst)
The Data Cache Block Touch (dcbt) and Data Cache Block Touch for Store (dcbtst) instructions provide potential system performance improvement through the use of software-initiated prefetch hints. The MPC750 treats these instructions identically (that is, a dcbtst instruction behaves exactly the same as a dcbt instruction on the MPC750). Note that processor implementations are not required to take any action based on the execution
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-15
Cache Control
of these instructions, but they may choose to prefetch the cache block corresponding to the effective address into their cache. The MPC750 loads the data into the cache when the address hits in the TLB or the BAT, is permitted load access from the addressed page, is not directed to a direct-store segment, and is directed at a cacheable page. Otherwise, the MPC750 treats these instructions as no-ops. The data brought into the cache as a result of this instruction is validated in the same manner that a load instruction would be (that is, it is marked as exclusive). The memory reference of a dcbt (or dcbtst) instruction causes the reference bit to be set. Note also that the successful execution of the dcbt (or dcbtst) instruction affects the state of the TLB and cache LRU bits as defined by the PLRU algorithm.
3.4.2.2
Data Cache Block Zero (dcbz)
The effective address is computed, translated, and checked for protection violations as defined in the PowerPC architecture. The dcbz instruction is treated as a store to the addressed byte with respect to address translation and protection. If the block containing the byte addressed by the EA is in the data cache, all bytes are cleared, and the tag is marked as modified (M). If the block containing the byte addressed by the EA is not in the data cache and the corresponding page is caching-allowed, the block is established in the data cache without fetching the block from main memory, and all bytes of the block are cleared, and the tag is marked as modified (M). If the contents of the cache block are from a page marked memory coherence required (M = 1), an address-only bus transaction is run prior to clearing the cache block. The dcbz instruction is the only cache control instruction that causes a broadcast on the 60x bus (when M = 1) to maintain coherency. The other cache control instructions are not broadcast unless broadcasting is specifically enabled through the HID0[ABE] configuration bit. The dcbz instruction executes regardless of whether the cache is locked, but if the cache is disabled, an alignment exception is generated. If the page containing the byte addressed by the EA is caching-inhibited or write-through, then the system alignment exception handler is invoked. BAT and TLB protection violations generate DSI exceptions. Both the MPC750 and MPC755 processors require protection in the use of the dcbz instruction in order to guarantee cache coherency in a multiprocessor system. Specifically, the dcbz instruction must be: * * Either enveloped by high-level software synchronization protocols (such as semaphores), or Preceded by execution of a dcbf instruction to the same address.
One of these precautions must be taken in order to guarantee that there are no simultaneous cache hits from a dcbz instruction and a snoop to that address. If these two events occur simultaneously, stale data may occur, causing system failures.
3-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Cache Control
3.4.2.3
Data Cache Block Store (dcbst)
The effective address is computed, translated, and checked for protection violations as defined in the PowerPC architecture. This instruction is treated as a load with respect to address translation and memory protection. If the address hits in the cache and the cache block is in the exclusive (E) state, no action is taken. If the address hits in the cache and the cache block is in the modified (M) state, the modified block is written back to memory and the cache block is placed in the exclusive (E) state. The execution of a dcbst instruction does not broadcast on the 60x bus unless broadcast is enabled through the HID0[ABE] bit. The function of this instruction is independent of the WIMG bit settings of the block containing the effective address. The dcbst instruction executes regardless of whether the cache is disabled or locked; however, a BAT or TLB protection violation generates a DSI exception.
3.4.2.4
Data Cache Block Flush (dcbf)
The effective address is computed, translated, and checked for protection violations as defined in the PowerPC architecture. This instruction is treated as a load with respect to address translation and memory protection. If the address hits in the cache, and the block is in the modified (M) state, the modified block is written back to memory and the cache block is placed in the invalid (I) state. If the address hits in the cache, and the cache block is in the exclusive (E) state, the cache block is placed in the invalid (I) state. If the address misses in the cache, no action is taken. The execution of dcbf does not broadcast on the 60x bus unless broadcast is enabled through the HID0[ABE] bit. The function of this instruction is independent of the WIMG bit settings of the block containing the effective address. The dcbf instruction executes regardless of whether the cache is disabled or locked; however, a BAT or TLB protection violation generates a DSI exception.
3.4.2.5
Data Cache Block Invalidate (dcbi)
The effective address is computed, translated, and checked for protection violations as defined in the PowerPC architecture. This instruction is treated as a store with respect to address translation and memory protection. If the address hits in the cache, the cache block is placed in the invalid (I) state, regardless of whether the data is modified. Because this instruction may effectively destroy modified data, it is privileged (that is, dcbi is available to programs at the supervisor privilege level, MSR[PR] = 0). The execution of dcbi does not broadcast on the 60x bus unless broadcast is enabled through the HID0[ABE] bit. The function of this instruction is independent of the WIMG
MOTOROLA Chapter 3. L1 Instruction and Data Cache Operation 3-17
Cache Operations
bit settings of the block containing the effective address. The dcbi instruction executes regardless of whether the cache is disabled or locked; however, a BAT or TLB protection violation generates a DSI exception.
3.4.2.6
Instruction Cache Block Invalidate (icbi)
For the icbi instruction, the effective address is not computed or translated, so it cannot generate a protection violation or exception. This instruction performs a virtual lookup into the instruction cache (index only). All ways of the selected instruction cache set are invalidated. The icbi instruction is not broadcast on the 60x bus. The icbi instruction invalidates the cache blocks independent of whether the cache is disabled or locked.
3.5
3.5.1
Cache Operations
Cache Block Replacement/Castout Operations
This section describes the MPC750 cache operations.
Both the instruction and data cache use a pseudo least-recently-used (PLRU) replacement algorithm when a new block needs to be placed in the cache. When the data to be replaced is in the modified (M) state, that data is written into a castout buffer while the missed data is being accessed on the bus. When the load completes, the MPC750 then pushes the replaced cache block from the castout buffer to the L2 cache (if L2 is enabled) or to main memory (if L2 is disabled). The replacement logic first checks to see if there are any invalid blocks in the set and chooses the lowest-order, invalid block (L[0-7]) as the replacement target. If all eight blocks in the set are valid, the PLRU algorithm is used to determine which block should be replaced. The PLRU algorithm is shown in Figure 3-5.
3-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Cache Operations
L0 invalid L0 valid L1 invalid L1 valid L2 invalid L2 valid L3 invalid L3 valid L4 invalid L4 valid L5 invalid L5 valid L6 invalid L6 valid L7 invalid L7 valid
Allocate L0 Allocate L1 Allocate L2 Allocate L3 Allocate L4 Allocate L5 Allocate L6 Allocate L7
B0 = 0
B0 = 1
B1 = 0
B1 = 1
B2 = 0
B2 = 1
B3 = 0
B3 = 1
B4 = 0
B4 = 1
B5 = 0
B5 = 1
B6 = 0
B6 = 1
Replace L0
Replace L1
Replace L2
Replace L3
Replace L4
Replace L5
Replace L6
Replace L7
Figure 3-5. PLRU Replacement Algorithm
Each cache is organized as eight blocks per set by 128 sets. There is a valid bit for each block in the cache, L[0-7]. When all eight blocks in the set are valid, the PLRU algorithm is used to select the replacement target. There are seven PLRU bits, B[0-6] for each set in
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-19
Cache Operations
the cache. For every hit in the cache, the PLRU bits are updated using the rules specified inTable 3-2.
Table 3-2. PLRU Bit Update Rules
If the Current Access is To: L0 L1 L2 L3 L4 L5 L6 L7 Then the PLRU bits are Changed to: B0 1 1 1 1 0 0 0 0 B1 1 1 0 0 x x x x B2 x x x x 1 1 0 0 B3 1 0 x x x x x x B4 x x 1 0 x x x x B5 x x x x 1 0 x x B6 x x x x x x 1 0
x = Does not change
If all eight blocks are valid, then a block is selected for replacement according to the PLRU bit encodings shown in Table 3-3.
Table 3-3. PLRU Replacement Block Selection
Then the Block Selected for Replacement Is: 0 B3 1 0 B4 1 0 B5 1 0 B6 1 L0 L1 L2 L3 L4 L5 L6 L7
If the PLRU Bits Are:
0 0 0 0 B0 1 1 1 1 B2 B1
0 0 1 1 0 0 1 1
During power-up or hard reset, all the valid bits of the blocks are cleared and the PLRU bits cleared to point to block L0 of each set. Note that this is also the state of the data or instruction cache after setting their respective flash invalidate bit (HID0[DCFI] or HID0[ICFI]).
3-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Cache Operations
3.5.2
Cache Flush Operations
The instruction cache can be invalidated by executing a series of icbi instructions or by setting HID0[ICFI]. The data cache can be invalidated by executing a series of dcbi instructions or by setting HID0[DCFI]. Any modified entries in the data cache can be copied back to memory (flushed) by using the dcbf instruction or by executing a series of 12 uniquely addressed load or dcbz instructions to each of the 128 sets. The address space should not be shared with any other process to prevent snoop hit invalidations during the flushing routine. Exceptions should be disabled during this time so that the PLRU algorithm does not get disturbed. The data cache flush assist bit, HID0[DCFA], simplifies the software flushing process. When set, HID0[DCFA] forces the PLRU replacement algorithm to ignore the invalid entries and follow the replacement sequence defined by the PLRU bits. This reduces the series of uniquely addressed load or dcbz instructions to eight per set. HID0[DCFA] should be set just prior to the beginning of the cache flush routine and cleared after the series of instructions is complete.
3.5.3
Data Cache-Block-Fill Operations
The MPC750's data cache blocks are filled in four beats of 64 bits each, with the critical double word loaded first. The data cache is not blocked to internal accesses while the load (caused by a cache miss) completes. This functionality is sometimes referred to as `hits under misses,' because the cache can service a hit while a cache miss fill is waiting to complete. The critical-double-word read from memory is simultaneously written to the data cache and forwarded to the requesting unit, thus minimizing stalls due to cache fill latency. A cache block is filled after a read miss or write miss (read-with-intent-to-modify) occurs in the cache. The cache block that corresponds to the missed address is updated by a burst transfer of the data from the L2 or system memory. Note that if a read miss occurs in a system with multiple bus masters, and the data is modified in another cache, the modified data is first written to external memory before the cache fill occurs.
3.5.4
Instruction Cache-Block-Fill Operations
The MPC750's instruction cache blocks are loaded in four beats of 64 bits each, with the critical double word loaded first. The instruction cache is not blocked to internal accesses while the fetch (caused by a cache miss) completes. On a cache miss, the critical and following double words read from memory are simultaneously written to the instruction cache and forwarded to the instruction queue, thus minimizing stalls due to cache fill latency. There is no snooping of the instruction cache.
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-21
L1 Caches and 60x Bus Transactions
3.5.5
Data Cache-Block-Push Operation
When a cache block in the MPC750 is snooped and hit by another bus master and the data is modified, the cache block must be written to memory and made available to the snooping device. The cache block that is hit is said to be pushed out onto the 60x bus. The MPC750 supports two kinds of push operations--normal push operations and enveloped high-priority push operations, which are described in Section 3.5.5.1, "Enveloped High-Priority Cache-Block-Push Operation."
3.5.5.1
Enveloped High-Priority Cache-Block-Push Operation
In cases where the MPC750 has completed the address tenure of a read operation, and then detects a snoop hit to a modified cache block by another bus master, the MPC750 provides a high-priority push operation. If the address snooped is the same as the address of the data to be returned by the read operation, ARTRY is asserted one or more times until the data tenure of the read operation is completed. The cache-block-push transaction can be enveloped within the address and data tenures of a read operation. This feature prevents deadlocks in system organizations that support multiple memory-mapped buses. More specifically, the MPC750 internally detects the scenario where a load request is outstanding and the processor has pipelined a write operation on top of the load. Normally, when the data bus is granted to the MPC750, the resulting data bus tenure is used for the load operation. The enveloped high-priority cache block push feature defines a bus signal, data bus write only (DBWO), which when asserted with a qualified data bus grant indicates that the resulting data tenure should be used for the store operation instead. This signal is described in Section 8.10, "Using Data Bus Write Only." Note that the enveloped copy-back operation is an internally pipelined bus operation.
3.6
L1 Caches and 60x Bus Transactions
The MPC750 transfers data to and from the cache in single-beat transactions of two words, or in four-beat transactions of eight words which fill a cache block. Single-beat bus transactions can transfer from one to eight bytes to or from the MPC750, and can be misaligned. Single-beat transactions can be caused by cache write-through accesses, caching-inhibited accesses (WIMG = x1xx), accesses when the cache is disabled (HID0[DCE] bit is cleared), or accesses when the cache is locked (HID0[DLOCK] bit is cleared). Burst transactions on the MPC750 always transfer eight words of data at a time, and are aligned to a double-word boundary. The MPC750 transfer burst (TBST) output signal indicates to the system whether the current transaction is a single-beat transaction or four-beat burst transfer. Burst transactions have an assumed address order. For cacheable read operations, instruction fetches, or cacheable, non-write-through write operations that miss the cache, the MPC750 presents the double-word-aligned address associated with the load/store instruction or instruction fetch that initiated the transaction.
3-22 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
L1 Caches and 60x Bus Transactions
As shown in Figure 3-6, the first quad word contains the address of the load/store or instruction fetch that missed the cache. This minimizes latency by allowing the critical code or data to be forwarded to the processor before the rest of the block is filled. For all other burst operations, however, the entire block is transferred in order (oct-word-aligned). Critical-double-word-first fetching on a cache miss applies to both the data and instruction cache.
MPC750 Cache Address Bits (27... 28) 00 A 01 B 10 C 11 D
If the address requested is in double-word A, the address placed on the bus is that of double-word A, and the four data beats are ordered in the following manner: Beat 0 A 1 B 2 C 3 D
If the address requested is in double-word C, the address placed on the bus will be that of double-word C, and the four data beats are ordered in the following manner: Beat 0 C 1 D 2 A 3 B
Figure 3-6. Double-Word Address Ordering--Critical Double Word First
3.6.1
Read Operations and the MEI Protocol
The MEI coherency protocol affects how the MPC750 data cache performs read operations on the 60x bus. All reads (except for caching-inhibited reads) are encoded on the bus as read-with-intent-to-modify (RWITM) to force flushing of the addressed cache block from other caches in the system. The MEI coherency protocol also affects how the MPC750 snoops read operations on the 60x bus. All reads snooped from the 60x bus (except for caching-inhibited reads) are interpreted as RWITM to cause flushing from the MPC750's cache. Single-beat reads (TBST negated) are interpreted by the MPC750 as caching inhibited. These actions for read operations allow the MPC750 to operate successfully (coherently) on the bus with other bus masters that implement either the three-state MEI or a four-state MESI cache coherency protocol.
3.6.2
Bus Operations Caused by Cache Control Instructions
The cache control, TLB management, and synchronization instructions supported by the MPC750 may affect or be affected by the operation of the 60x bus. The operation of the
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-23
L1 Caches and 60x Bus Transactions
instructions may also indirectly cause bus transactions to be performed, or their completion may be linked to the bus. The dcbz instruction is the only cache control instruction that causes an address-only broadcast on the 60x bus. All other data cache control instructions (dcbi, dcbf, dcbst, and dcbz) are not broadcast unless specifically enabled through the HID0[ABE] configuration bit. Note that dcbi, dcbf, dcbst, and dcbz do broadcast to the MPC750's L2 cache, regardless of HID0[ABE]. HID0[ABE] also controls the broadcast of the sync and eieio instructions. The icbi instruction is never broadcast. No broadcasts by other masters are snooped by the MPC750 (except for dcbz kill block transactions). For detailed information on the cache control instructions, refer to Chapter 2, "Programming Model," in this book and Chapter 8, "Instruction Set," in the Programming Environments Manual. Table 3-4 provides an overview of the bus operations initiated by cache control instructions. Note that Table 3-4 assumes that the WIM bits are set to 001; that is, the cache is operating in write-back mode, caching is permitted and coherency is enforced.
Table 3-4. Bus Operations Caused by Cache Control Instructions (WIM = 001)
Instruction sync Current Cache State Don't care Next Cache State No change Bus Operation sync (if enabled in HID0[ABE]) None None Comment Waits for memory queues to complete bus activity -- Waits for the negation of the TLBSYNC input signal to complete Address-only bus operation
tlbie tlbsync
-- --
-- --
eieio
Don't care
No change
eieio (if enabled in HID0[ABE]) None Kill block (if enabled in HID0[ABE]) Flush block (if enabled in HID0[ABE]) Write with kill Clean block (if enabled in HID0[ABE]) Write with kill Write with kill Kill block
icbi dcbi
Don't care Don't care
I I
-- Address-only bus operation
dcbf
I, E
I
Address-only bus operation
dcbf dcbst
M I, E
I No change
Block is pushed Address-only bus operation
dcbst dcbz dcbz
M I E, M
E M M
Block is pushed -- Writes over modified data
3-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
L1 Caches and 60x Bus Transactions
Table 3-4. Bus Operations Caused by Cache Control Instructions (WIM = 001)
Instruction dcbt dcbt dcbtst dcbtst Current Cache State I E, M I E,M E No change E No change Next Cache State Bus Operation Comment
Read-with-intent-to- Fetched cache block is modify stored in the cache None --
Read-with-intent-to- Fetched cache block is modify stored in the cache None --
For additional details about the specific bus operations performed by the MPC750, see Chapter 8, "System Interface Operation."
3.6.3
Snooping
The MPC750 maintains data cache coherency in hardware by coordinating activity between the data cache, the bus interface logic, the L2 cache, and the memory system. The MPC750 has a copy-back cache which relies on bus snooping to maintain cache coherency with other caches in the system. For the MPC750, the coherency size of the bus is the size of a cache block, 32 bytes. This means that any bus transactions that cross an aligned 32-byte boundary must present a new address onto the bus at that boundary for proper snoop operation by the MPC750, or they must operate noncoherently with respect to the MPC750. As bus operations are performed on the bus by other bus masters, the MPC750 bus snooping logic monitors the addresses and transfer attributes that are referenced. The MPC750 snoops the bus transactions during the cycle that TS is asserted for any of the following qualified snoop conditions: * * The global signal (GBL) is asserted indicating that coherency enforcement is required. A reservation is currently active in the MPC750 as the result of an lwarx instruction, and the transfer type attributes (TT[0-4]) indicate a write or kill operation. These transactions are snooped regardless of whether GBL is asserted to support reservations in the MEI cache protocol.
The state of ABB is not sampled to determine a qualified snoop condition. All transactions snooped by the MPC750 are checked for correct address bus parity. Every assertion of TS detected by the MPC750 (whether snooped or not) must be followed by an accompanying assertion of AACK. Once a qualified snoop condition is detected on the bus, the snooped address associated with TS is compared against the data cache tags, memory queues, and/or other storage elements as appropriate. The L1 data cache tags and L2 cache tags are snooped for standard data cache coherency support. No snooping is done in the instruction cache for coherency.
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-25
L1 Caches and 60x Bus Transactions
The memory queues are snooped for pipeline collisions and memory coherency collisions. A pipeline collision is detected when another bus master addresses any portion of a line that this MPC750's data cache is currently in the process of loading (L1 loading from L2, or L1/L2 loading from memory). A memory coherency collision occurs when another bus master addresses any portion of a line that the MPC750 has currently queued to write to memory from the data cache (castout or copy-back), but has not yet been granted bus access to perform. If a snooped transaction results in a cache hit or pipeline collision or memory queue collision, the MPC750 asserts ARTRY on the 60x bus. The current bus master, detecting the assertion of the ARTRY signal, should abort the transaction and retry it at a later time, so that the MPC750 can first perform a write operation back to memory from its cache or memory queues. The MPC750 may also retry a bus transaction if it is unable to snoop the transaction on that cycle due to internal resource conflicts. Additional snoop action may be forwarded to the cache as a result of a snoop hit in some cases (a cache push of modified data, or a cache block invalidation).
3.6.4
Snoop Response to 60x Bus Transactions
There are several bus transaction types defined for the 60x bus. The transactions in Table 3-5 correspond to the transfer type signals TT[0-4], which are described in Section 7.2.4.1, "Transfer Type (TT[0-4])."
Table 3-5. Response to Snooped Bus Transactions
Snooped Transaction Clean block Flush block SYNC Kill block TT[0-4] 00000 00100 01000 01100 No action is taken. No action is taken. No action is taken. The kill block operation is an address-only bus transaction initiated when a dcbz or dcbi instruction is executed * If the addressed cache block is in the exclusive (E) state, the cache block is placed in the invalid (I) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid (I) state. * If the address misses in the cache, no action is taken. Any reservation associated with the address is canceled. No action is taken. No action is taken. No action is taken. No action is taken. No action is taken. MPC750 Response
EIEIO External control word write TLB invalidate External control word read lwarx reservation set
10000 10100 11000 11100 00001
3-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
L1 Caches and 60x Bus Transactions
Table 3-5. Response to Snooped Bus Transactions (continued)
Snooped Transaction Reserved TLBSYNC ICBI Reserved Write-with-flush TT[0-4] 00101 01001 01101 1xx01 00010 -- No action is taken. No action is taken. -- A write-with-flush operation is a single-beat or burst transaction initiated when a caching-inhibited or write-through store instruction is executed. * If the addressed cache block is in the exclusive (E) state, the cache block is placed in the invalid (I) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid (I) state. * If the address misses in the cache, no action is taken. Any reservation associated with the address is canceled. A write-with-kill operation is a burst transaction initiated due to a castout, caching-allowed push, or snoop copy -back. * If the address hits in the cache, the cache block is placed in the invalid (I) state (killing modified data that may have been in the block). * If the address misses in the cache, no action is taken. Any reservation associated with the address is canceled. A read operation is used by most single-beat and burst load transactions on the bus. For single-beat, caching-inhibited read transaction: * If the addressed cache block is in the exclusive (E) state, the cache block remains in the exclusive (E) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the exclusive (E) state. * If the address misses in the cache, no action is taken. For burst read transactions: * If the addressed cache block is in the exclusive (E) state, the cache block is placed in the invalid (I) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid (I) state. * If the address misses in the cache, no action is taken. A RWITM operation is issued to acquire exclusive use of a memory location for the purpose of modifying it. * If the addressed cache block is in the exclusive (E) state, the cache block is placed in the invalid (I) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid (I) state. * If the address misses in the cache, no action is taken. MPC750 Response
Write-with-kill
00110
Read
01010
Read-with-intentto-modify (RWITM)
01110
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-27
L1 Caches and 60x Bus Transactions
Table 3-5. Response to Snooped Bus Transactions (continued)
Snooped Transaction Write-with-flush-atomic TT[0-4] 10010 MPC750 Response Write-with-flush-atomic operations occur after the processor issues an stwcx. instruction. * If the addressed cache block is in the exclusive (E) state, the cache block is placed in the invalid (I) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the invalid (I) state. * If the address misses in the cache, no action is taken. Any reservation is canceled, regardless of the address. -- Read atomic operations appear on the bus in response to lwarx instructions and generate the same snooping responses as read operations. The RWITM atomic operations appear on the bus in response to stwcx. instructions and generate the same snooping responses as RWITM operations. -- -- A RWNITC operation is issued to acquire exclusive use of a memory location with no intention of modifying the location. * If the addressed cache block is in the exclusive (E) state, the cache block remains in the exclusive (E) state. * If the addressed cache block is in the modified (M) state, the MPC750 asserts ARTRY and initiates a push of the modified block out of the cache and the cache block is placed in the exclusive (E) state. * If the address misses in the cache, no action is taken. -- --
Reserved Read-atomic
10110 11010
Read-with-intentto-modify-atomic Reserved Reserved Read-with-no-intentto-cache (RWNITC)
11110
00011 00111 01011
Reserved Reserved
01111 1xx11
3.6.5
Transfer Attributes
In addition to the address and transfer type signals, the MPC750 supports the transfer attribute signals TBST, TSIZ[0-2], WT, CI, and GBL. The TBST and TSIZ[0-2] signals indicate the data transfer size for the bus transaction. The WT signal reflects the write-through status (the complement of the W bit) for the transaction as determined by the MMU address translation during write operations. WT is asserted for burst writes due to dcbf (flush) and dcbst (clean) instructions, and for snoop pushes; WT is negated for ecowx transactions. Since the write-through status is not meaningful for reads, the MPC750 uses the WT signal during read transactions to indicate that the transaction is an instruction fetch (WT negated), or not an instruction fetch (WT asserted). The CI signal reflects the caching-inhibited/allowed status (the complement of the I bit) of the transaction as determined by the MMU address translation even if the L1 caches are
3-28 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
L1 Caches and 60x Bus Transactions
disabled or locked. CI is always asserted for eciwx/ecowx bus transactions independent of the address translation. The GBL signal reflects the memory coherency requirements (the complement of the M bit) of the transaction as determined by the MMU address translation. Castout and snoop copy-back operations (TT[0-4] = 00110) are generally marked as nonglobal (GBL negated) and are not snooped (except for reservation monitoring). Other masters, however, may perform DMA write operations with this encoding but marked global (GBL asserted) and thus must be snooped. Table 3-6 summarizes the address and transfer attribute information presented on the bus by the MPC750 for various master or snoop-related transactions.
Table 3-6. Address/Transfer Attribute Summary
Bus Transaction Instruction fetch operations: Burst (caching-allowed) Single-beat read (caching-inhibited or cache disabled) Data cache operations: Cache block fill (due to load or store PA[0-28] || 0b000 miss) Castout (normal replacement) Push (cache block push due to dcbf/dcbst) Snoop copyback Data cache bypass operations: Single-beat read (caching-inhibited or cache disabled) PA[0-31] A1010 00010 1 1 SSS SSS M M 0 W I I CA[0-26] || 0b00000 PA[0-26] || 0b00000 CA[0-26] || 0b00000 A1110 00110 00110 00110 0 0 0 0 010 010 010 010 M 1 1 1 0 1 0 0 1* 1* 1* 1* PA[0-28] || 0b000 PA[0-28] || 0b000 01110 01010 0 1 010 000 M M 1 1 1* I A[0-31] TT[0-4] TBST TSIZ[0-2] GBL WT CI
Single-beat write (caching-inhibited, PA[0-31] write-through, or cache disabled) Special instructions: dcbz (addr-only) dcbi (if HID0[ABE] = 1, addr-only) dcbf (if HID0[ABE] = 1, addr-only) dcbst (if HID0[ABE] = 1, addr-only) sync (if HID0[ABE] = 1, addr-only) eieio (if HID0[ABE] = 1, addr-only) stwcx. (always single-beat write) eciwx PA[0-28] || 0b000 PA[0-26] || 0b00000 PA[0-26] || 0b00000 PA[0-26] || 0b00000 0x0000_0000 0x0000_0000 PA[0-29] || 0b00 PA[0-29] || 0b00
01100 01100 00100 00000 01000 10000 10010 11100
0 0 0 0 0 0 1
010 010 010 010 010 010 100
0* M M M 0 0 M 1
0 0 0 0 0 0 W 0
1* 1* 1* 1* 0 0 I 0
EAR[28-31]
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-29
Bus Interface
Table 3-6. Address/Transfer Attribute Summary (continued)
Bus Transaction ecowx A[0-31] PA[0-29] || 0b00 TT[0-4] 10100 TBST TSIZ[0-2] GBL 1 WT 1 CI 0
EAR[28-31]
Notes: PA = Physical address, CA = Cache address. W,I,M = WIM state from address translation; = complement; 0*or 1* = WIM state implied by transaction type in table For instruction fetches, reflection of the M bit must be enabled through HID0[IFEM]. A = Atomic; high if lwarx, low otherwise S = Transfer size Special instructions listed may not generate bus transactions depending on cache state.
3.7
Bus Interface
The bus interface buffers bus requests from the instruction and data caches, and executes the requests per the 60x bus protocol. It includes address register queues, prioritizing logic, and bus control logic. The bus interface also captures snoop addresses for snooping in the cache and in the address register queues, snoops for reservations, and holds the touch load address for the cache. All data storage for the address register buffers (load and store data buffers) are located in the cache section. The data buffers are considered temporary storage for the cache and not part of the bus interface. The general functions and features of the bus interface are as follows: * Seven address register buffers that include the following: -- Instruction cache load address buffer -- Data cache load address buffer -- Two data cache castout/store address buffers (associated data block buffers located in cache) -- Data cache snoop copy-back address buffer (associated data block buffer located in cache) -- Reservation address buffer for snoop monitoring Pipeline collision detection for data cache buffers Reservation address snooping for lwarx/stwcx. instructions One-level address pipelining Load ahead of store capability
* * * *
A conceptual block diagram of the bus interface is shown in Figure 3-7. The address register queues in the figure hold transaction requests that the bus interface may issue on the bus independently of the other requests. The bus interface may have up to two transactions operating on the bus at any given time through the use of address pipelining.
3-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MEI State Transactions
I-Cache D-Cache
BIU Control
I-Cache LD Addr
D-Cache LD Addr
D-Cache CST/ST Addr 0
D-Cache CST/ST Addr 1
D-Cache SNP Addr
Snoop Control Addr L2 or System Bus Addr Data Data
Figure 3-7. Bus Interface Address Buffers
For additional information about the MPC750 bus interface and the bus protocols, refer to Chapter 8, "System Interface Operation."
3.8
MEI State Transactions
Table 3-7 shows MEI state transitions for various operations. Bus operations are described in Table 3-5.
Table 3-7. MEI State Transitions
Operation Cache Operation Read Bus sync No WIM Current Cache State I Next Cache State Same Cache Actions Bus Operation
Load (T = 0)
x0x
1 Cast out of modified block (as required) 2 Pass four-beat read to memory queue
Write-with-kill Read -- Read --
Load (T = 0) Load (T = 0) Load (T = 0) Load (T = 0) lwarx
Read Read Read Read Read
No No No No
x0x x1x x1x x1x
E,M I E M
Same Same I I
Read data from cache Pass single-beat read to memory queue CRTRY read
CRTRY read (push sector Write-with-kill to write queue)
Acts like other reads but bus operation uses special encoding
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-31
MEI State Transactions
Table 3-7. MEI State Transitions (continued)
Operation Cache Operation Write Bus sync No WIM Current Cache State I Next Cache State Same Cache Actions Bus Operation
Store (T = 0)
00x
Cast out of modified block Write-with-kill (if necessary) Pass RWITM to memory queue RWITM -- Write-with-flush -- Write-with-flush --
Store (T = 0) Store stwcx. (T = 0) Store stwcx. (T = 0)
Write Write Write
No No No
00x 10x 10x
E,M I E
M Same Same
Write data to cache Pass single-beat write to memory queue Write data to cache Pass single-beat write to memory queue
Store stwcx. (T = 0) Store (T = 0) or stwcx. (WIM = 10x) Store (T = 0) or stwcx. (WIM = 10x) Store (T = 0) or stwcx. (WIM = 10x) stwcx. dcbf
Write
No
10x
M
Same
CRTRY write
Push block to write queue Write-with-kill Write No x1x I Same Pass single-beat write to memory queue CRTRY write Write-with-flush
Write
No
x1x
E
I
--
Write
No
x1x
M
I
CRTRY write
--
Push block to write queue Write-with-kill Conditional write Data cache block flush If the reserved bit is set, this operation is like other writes except the bus operation uses a special encoding. No xxx I,E Same CRTRY dcbf Pass flush Same I I Same State change only -- Flush --
dcbf dcbst
Data cache block flush Data cache block store
No No
xxx xxx
M I,E
Push block to write queue Write-with-kill CRTRY dcbst Pass clean -- Clean --
Same dcbst dcbz Data cache block store Data cache block set to zero Data cache block set to zero No No xxx x1x M x
Same E x
No action
Push block to write queue Write-with-kill Alignment trap --
dcbz
No
10x
x
x
Alignment trap
--
3-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MEI State Transactions
Table 3-7. MEI State Transitions (continued)
Operation Cache Operation Data cache block set to zero Bus sync Yes WIM Current Cache State I Next Cache State Same Cache Actions Bus Operation
dcbz
00x
CRTRY dcbz
--
Cast out of modified block Write-with-kill Pass kill Same M M Clear block Clear block Kill -- --
dcbz
Data cache block set to zero Data cache block touch Data cache block touch Data cache block touch Data cache block touch
No
00x
E,M
dcbt dcbt dcbt
No No No
x1x x1x x1x
I E M
Same I I
Pass single-beat read to memory queue CRTRY read CRTRY read
Read -- --
Push block to write queue Write-with-kill No x0x I Same Cast out of modified block Write-with-kill (as required) Pass four-beat read to memory queue Read -- -- --
dcbt
dcbt
Data cache block touch
No No No
x0x xxx xxx
E,M I I
Same Same E
No action Forward data_in Write data_in to cache
Single-beat read Reload dump 1 Four-beat read (double-word -aligned) Four-beat write (double-wordaligned) EI MI Push MI Push ME tlbie Reload dump Reload dump Snoop write or kill Snoop kill Snoop flush Snoop clean TLB invalidate
No
xxx
I
M
Write data_in to cache
--
No No No No No
xxx xxx xxx xxx xxx
E M M M x
I I I E x
State change only (committed) State change only (committed) Conditionally push Conditionally push CRTRY TLBI Pass TLBI No action
-- -- Write-with-kill Write-with-kill -- -- --
MOTOROLA
Chapter 3. L1 Instruction and Data Cache Operation
3-33
MEI State Transactions
Table 3-7. MEI State Transitions (continued)
Operation Cache Operation Synchronization Bus sync No WIM Current Cache State x Next Cache State x Cache Actions Bus Operation
sync
xxx
CRTRY sync Pass sync No action
-- -- --
Note that single-beat writes are not snooped in the write queue.
3-34
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 4 Exceptions
This chapter describes the exceptions model for the MPC750. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor." The OEA portion of the PowerPC architecture defines the mechanism by which processors of this family implement exceptions (referred to as interrupts in the architecture specification). Exception conditions may be defined at other levels of the architecture. For example, the UISA defines conditions that may cause floating-point exceptions; the OEA defines the mechanism by which the exception is taken. The PowerPC exception mechanism allows the processor to change to supervisor state as a result of unusual conditions arising in the execution of instructions and from external signals, bus errors, or various internal conditions. When exceptions occur, information about the state of the processor is saved to certain registers and the processor begins execution at an address (exception vector) predetermined for each exception. Processing of exceptions begins in supervisor mode. Although multiple exception conditions can map to a single exception vector, often a more specific condition may be determined by examining a register associated with the exception--for example, the DSISR and the floating-point status and control register (FPSCR). Also, software can explicitly enable or disable some exception conditions. The PowerPC architecture requires that exceptions be taken in program order; therefore, although a particular implementation may recognize exception conditions out of order, they are handled strictly in order with respect to the instruction stream. When an instruction-caused exception is recognized, any unexecuted instructions that appear earlier in the instruction stream, including any that have not yet entered the execute state, are required to complete before the exception is taken. For example, if a single instruction encounters multiple exception conditions, those exceptions are taken and handled sequentially. Likewise, exceptions that are asynchronous and precise are recognized when they occur, but are not handled until all instructions currently in the execute stage successfully complete execution and report their results. To prevent loss of state information, exception handlers must save the information stored in the machine status save/restore registers, SRR0 and SRR1, soon after the exception is
MOTOROLA Chapter 4. Exceptions 4-1
MPC750 Microprocessor Exceptions
taken to prevent this information from being lost due to another exception being taken. Because exceptions can occur while an exception handler routine is executing, multiple exceptions can become nested. It is up to the exception handler to save the necessary state information if control is to return to the excepting program. In many cases, after the exception handler handles an exception, there is an attempt to execute the instruction that caused the exception. Instruction execution continues until the next exception condition is encountered. Recognizing and handling exception conditions sequentially guarantees that the machine state is recoverable and processing can resume without losing instruction results. In this book, the following terms are used to describe the stages of exception processing: Recognition Taken Exception recognition occurs when the condition that can cause an exception is identified by the processor. An exception is said to be taken when control of instruction execution is passed to the exception handler; that is, the context is saved and the instruction at the appropriate vector offset is fetched and the exception handler routine is begun in supervisor mode. Exception handling is performed by the software linked to the appropriate vector offset. Exception handling is begun in supervisor mode (referred to as privileged state in the architecture specification).
Handling
Note that the PowerPC architecture documentation refers to exceptions as interrupts. In this book, the term `interrupt' is reserved to refer to asynchronous exceptions and sometimes to the event that causes the exception. Also, the PowerPC architecture uses the word `exception' to refer to IEEE-defined floating-point exception conditions that may cause a program exception to be taken; see Section 4.5.7, "Program Exception (0x00700)." The occurrence of these IEEE exceptions may not cause an exception to be taken. IEEE-defined exceptions are referred to as IEEE floating-point exceptions or floating-point exceptions.
4.1
MPC750 Microprocessor Exceptions
As specified by the PowerPC architecture, exceptions can be either precise or imprecise and either synchronous or asynchronous. Asynchronous exceptions are caused by events external to the processor's execution; synchronous exceptions are caused by instructions.
4-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC750 Microprocessor Exceptions
The types of exceptions are shown in Table 4-1. Note that all exceptions except for the system management interrupt, thermal management, and performance monitor exception are defined, at least to some extent, by the PowerPC architecture.
Table 4-1. MPC750 Microprocessor Exception Classifications
Synchronous/Asynchronous Precise/Imprecise Asynchronous, nonmaskable Asynchronous, maskable Synchronous Imprecise Precise Precise Exception Types Machine check, system reset External interrupt, decrementer, system management interrupt, performance monitor interrupt, thermal management interrupt Instruction-caused exceptions
These classifications are discussed in greater detail in Section 4.2, "Exception Recognition and Priorities." For a better understanding of how the MPC750 implements precise exceptions, see Chapter 6, "Instruction Timing." Exceptions implemented in the MPC750, and conditions that cause them, are listed in Table 4-2.
Table 4-2. Exceptions and Conditions
Exception Type Reserved System reset Machine check DSI ISI External interrupt Alignment Vector Offset (hex) 00000 00100 00200 00300 00400 00500 00600 -- Assertion of either HRESET or SRESET or at power-on reset Assertion of TEA during a data bus transaction, assertion of MCP, or an address, data, or L2 bus parity error. MSR[ME] must be set. As specified in the PowerPC architecture. For TLB misses on load, store, or cache operations, a DSI exception occurs if a page fault occurs. As defined by the PowerPC architecture MSR[EE] = 1 and INT is asserted * floating-point load/store, stmw, stwcx., lmw, lwarx, eciwx, or ecowx instruction operand is not word-aligned. * multiple/string load/store operation is attempted in little-endian mode * n operand of a dcbz instruction is on a page that is write-through or cache-inhibited for a virtual mode access. * n attempt to execute a dcbz instruction occurs when the cache is disabled. As defined by the PowerPC architecture As defined by the PowerPC architecture As defined by the PowerPC architecture, when the most-significant bit of the DEC register changes from 0 to 1 and MSR[EE] = 1 Causing Conditions
Program Floating-point unavailable Decrementer Reserved System call Trace
00700 00800 00900
00A00-00BFF -- 00C00 00D00 Execution of the System Call (sc) instruction MSR[SE] =1 or a branch instruction is completing and MSR[BE] =1. The MPC750 differs from the OEA by not taking this exception on an isync.
MOTOROLA
Chapter 4. Exceptions
4-3
Exception Recognition and Priorities
Table 4-2. Exceptions and Conditions (continued)
Exception Type Reserved Vector Offset (hex) 00E00 Causing Conditions The MPC750 does not generate an exception to this vector. Other processors that implement the PowerPC architecture may use this vector for floating-point assist exceptions.
Reserved Performance monitor Instruction address breakpoint System management interrupt Reserved Thermal management interrupt Reserved
00E10-00EFF -- 00F00 01300 01400 The limit specified in PMCn is met and MMCR0[ENINT] = 1 (MPC750-specific) IABR[0-29] matches EA[0-29] of the next instruction to complete, IABR[TE] matches MSR[IR], and IABR[BE] = 1 (MPC750-specific) MSR[EE] = 1 and SMI is asserted (MPC750-specific)
01500-016FF -- 01700 Thermal management is enabled, junction temperature exceeds the threshold specified in THRM1 or THRM2, and MSR[EE] = 1 (MPC750-specific)
01800-02FFF --
4.2
Exception Recognition and Priorities
Exceptions are roughly prioritized by exception class, as follows: 1. Nonmaskable, asynchronous exceptions have priority over all other exceptions--system reset and machine check exceptions (although the machine check exception condition can be disabled so the condition causes the processor to go directly into the checkstop state). These exceptions cannot be delayed and do not wait for completion of any precise exception handling. 2. Synchronous, precise exceptions are caused by instructions and are taken in strict program order. 3. Imprecise exceptions (imprecise mode floating-point enabled exceptions) are caused by instructions and they are delayed until higher priority exceptions are taken. Note that the MPC750 does not implement an exception of this type. * Maskable asynchronous exceptions (external, decrementer, thermal management, system management, performance monitor, and interrupt exceptions) are delayed until higher priority exceptions are taken. The following list of exception categories describes how the MPC750 handles exceptions up to the point of signaling the appropriate interrupt to occur. Note that a recoverable state is reached if the completed store queue is empty (drained, not canceled) and any instruction that is next in program order and has been signaled to complete has completed. If MSR[RI] = 0, the MPC750 is in a nonrecoverable state. Also, instruction completion is defined as updating all architectural registers associated with that instruction, and then removing that instruction from the completion buffer.
4-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Recognition and Priorities
*
*
Exceptions caused by asynchronous events (interrupts). These exceptions are further distinguished by whether they are maskable and recoverable. -- Asynchronous, nonmaskable, nonrecoverable System reset for assertion of HRESET--Has highest priority and is taken immediately regardless of other pending exceptions or recoverability. (Includes power-on reset) -- Asynchronous, maskable, nonrecoverable Machine check exception--Has priority over any other pending exception except system reset for assertion of HRESET. Taken immediately regardless of recoverability. -- Asynchronous, nonmaskable, recoverable System reset for SRESET--Has priority over any other pending exception except system reset for HRESET (or power-on reset), or machine check. Taken immediately when a recoverable state is reached. -- Asynchronous, maskable, recoverable System management, performance monitor, thermal management, external, and decrementer interrupts--Before handling this type of exception, the next instruction in program order must complete. If that instruction causes another type of exception, that exception is taken and the asynchronous, maskable recoverable exception remains pending, until the instruction completes. Further instruction completion is halted. The asynchronous, maskable recoverable exception is taken when a recoverable state is reached. Instruction-related exceptions. These exceptions are further organized into the point in instruction processing in which they generate an exception. -- Instruction fetch ISI exceptions--Once this type of exception is detected, dispatching stops and the current instruction stream is allowed to drain out of the machine. If completing any of the instructions in this stream causes an exception, that exception is taken and the instruction fetch exception is discarded (but may be encountered again when instruction processing resumes). Otherwise, once all pending instructions have executed and a recoverable state is reached, the ISI exception is taken. -- Instruction dispatch/execution Program, DSI, alignment, floating-point unavailable, system call, and instruction address breakpoint--This type of exception is determined during dispatch or execution of an instruction. The exception remains pending until all instructions before the exception-causing instruction in program order complete. The exception is then taken without completing the exception-causing instruction. If completing these previous instructions causes an exception, that exception takes
MOTOROLA
Chapter 4. Exceptions
4-5
Exception Recognition and Priorities
priority over the pending instruction dispatch/execution exception, which is then discarded (but may be encountered again when instruction processing resumes). -- Post-instruction execution Trace--Trace exceptions are generated following execution and completion of an instruction while trace mode is enabled. If executing the instruction produces conditions for another type of exception, that exception is taken and the post-instruction exception is forgotten for that instruction. Note that these exception classifications correspond to how exceptions are prioritized, as described in Table 4-3.
Table 4-3. MPC750 Exception Priorities
Priority Exception Cause Asynchronous Exceptions (Interrupts) 0 1 2 3 4 5 6 7 System reset Machine check System reset System management External interrupt Performance monitor Decrementer Power on reset, assertion of HRESET and TRST (hard reset) Any enabled machine check condition (L2 data parity error, assertion of TEA or MCP) Assertion of SRESET (soft reset) Assertion of SMI Assertion of INT Any programmer-specified performance monitor condition Decrementer passes through zero
Thermal management Any programmer-specified thermal management condition Instruction Fetch Exceptions
0
ISI
Any ISI exception condition Instruction Dispatch/Execution Exceptions
0 1 2 3 4 5
Instruction address breakpoint Program System call Floating-point unavailable Program DSI
Any instruction address breakpoint exception condition Occurrence of an illegal instruction, privileged instruction, or trap exception condition. Note that floating-point enabled program exceptions have lower priority. System Call (sc) instruction Any floating-point unavailable exception condition A floating-point enabled exception condition (lowest-priority program exception) DSI exception due to eciwx, ecowx with EAR[E] = 0 (DSISR[11]). Lower priority DSI exception conditions are shown below.
4-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Processing
Table 4-3. MPC750 Exception Priorities (continued)
Priority 6 Exception Alignment Cause Any alignment exception condition, prioritized as follows: 1 Floating-point access not word-aligned 2 lmw, stmw, lwarx, stwcx. not word-aligned 3 eciwx or ecowx not word-aligned 4 Multiple or string access with MSR[LE] set 5 dcbz to write-through or cache-inhibited page or cache is disabled BAT page protection violation Any access except cache operations to a segment where SR[T] = 1 (DSISR[5]) or an access crosses from a T = 0 segment to one where T = 1 (DSISR[5]) TLB page protection violation DABR address match Post-Instruction Execution Exceptions 11 Trace * MSR[SE] = 1 (or MSR[BE] = 1 for branches)
7 8 9 10
DSI DSI DSI DSI
System reset and machine check exceptions may occur at any time and are not delayed even if an exception is being handled. As a result, state information for an interrupted exception may be lost; therefore, these exceptions are typically nonrecoverable. An exception may not be taken immediately when it is recognized.
4.3
Exception Processing
When an exception is taken, the processor uses SRR0 and SRR1 to save the contents of the MSR for the current context and to identify where instruction execution should resume after the exception is handled. When an exception occurs, the address saved in SRR0 helps determine where instruction processing should resume when the exception handler returns control to the interrupted process. Depending on the exception, this may be the address in SRR0 or at the next address in the program flow. All instructions in the program flow preceding this one will have completed execution and no subsequent instruction will have begun execution. This may be the address of the instruction that caused the exception or the next one (as in the case of a system call, trace, or trap exception). The SRR0 register is shown in Figure 4-1.
SRR0 (Holds EA for Instruction in Interrupted Program Flow)
0 31
Figure 4-1. Machine Status Save/Restore Register 0 (SRR0)
SRR1 is used to save machine status (selected MSR bits and possibly other status bits as well) on exceptions and to restore those values when an rfi instruction is executed. SRR1 is shown in Figure 4-2.
MOTOROLA
Chapter 4. Exceptions
4-7
Exception Processing
Exception-Specific Information and MSR Bit Values
0 31
Figure 4-2. Machine Status Save/Restore Register 1 (SRR1)
For most exceptions, bits 2-4 and 10-12 of SRR1 are loaded with exception-specific information and MSR[5-9, 16-31] are placed into the corresponding bit positions of SRR1. The MPC750's MSR is shown in Figure 4-3.
Reserved 0 0 0 0 0 0 0 0 0 0 0 0 0 0 POW 0 ILE EE PR FP ME FE0 SE BE FE1 0 IP IR DR 0 PM RI LE
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Figure 4-3. Machine State Register (MSR)
The MSR bits are defined in Table 4-4.
4
Table 4-4. MSR Bit Settings
Bits 0 1-4 5-9 Name -- -- -- -- POW Reserved. Full function. 1 Reserved. Partial function.1 Reserved. Full function.1 Reserved. Partial function.1 Power management enable. Power management functions are implementation-dependent. See Chapter 10, "Power and Thermal Management." 0 Power management disabled (normal operation mode). 1 Power management enabled (reduced power mode). Reserved. Implementation-specific. Exception little-endian mode. When an exception occurs, this bit is copied into MSR[LE] to select the endian mode for the context established by the exception. External interrupt enable 0 The processor delays recognition of external interrupts and decrementer exception conditions. 1 The processor is enabled to take an external interrupt or the decrementer exception. Privilege level 0 The processor can execute both user- and supervisor-level instructions. 1 The processor can only execute user-level instructions. Floating-point available 0 The processor prevents dispatch of floating-point instructions, including floating-point loads, stores, and moves. 1 The processor can execute floating-point instructions and can take floating-point enabled program exceptions. Machine check enable 0 Machine check exceptions are disabled. 1 Machine check exceptions are enabled. IEEE floating-point exception mode 0 (see Table 4-5). Description
10-12 13
14 15 16
-- ILE EE
17
PR
18
FP
19
ME
20
FE0
4-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Processing
Table 4-4. MSR Bit Settings (continued)
Bits 21 Name SE Description Single-step trace enable 0 The processor executes instructions normally. 1 The processor generates a single-step trace exception upon the successful execution of every instruction except rfi, isync, and sc. Successful execution means that the instruction caused no other exception. Branch trace enable 0 The processor executes branch instructions normally. 1 The processor generates a branch type trace exception when a branch instruction executes successfully. IEEE floating-point exception mode 1 (see Table 4-5). Reserved. This bit corresponds to the AL bit of the POWER architecture. Exception prefix. The setting of this bit specifies whether an exception vector offset is prepended with Fs or 0s. In the following description, nnnnn is the offset of the exception. 0 Exceptions are vectored to the physical address 0x000n_nnnn. 1 Exceptions are vectored to the physical address 0xFFFn_nnnn. Instruction address translation 0 Instruction address translation is disabled. 1 Instruction address translation is enabled. For more information see Chapter 5, "Memory Management." Data address translation 0 Data address translation is disabled. 1 Data address translation is enabled. For more information see Chapter 5, "Memory Management." Reserved. Full function1 Performance monitor marked mode. MPC750-specific; defined as reserved by the PowerPC architecture. For more information about the performance monitor, see Section 4.5.13, "Performance Monitor Interrupt (0x00F00)." 0 Process is not a marked process. 1 Process is a marked process. Indicates whether system reset or machine check exception is recoverable. RI indicates whether from the perspective of the processor, it is safe to continue (that is, processor state data such as that saved to SRR0 is valid), but it does not guarantee that the interrupted process is recoverable. 0 Exception is not recoverable. 1 Exception is recoverable. Little-endian mode enable 0 The processor runs in big-endian mode. 1 The processor runs in little-endian mode.
22
BE
23 24 25
FE1 -- IP
26
IR
27
DR
28 29
-- PM
30
RI
31
LE
1
Full function reserved bits are saved in SRR1 when an exception occurs; partial function reserved bits are not saved.
The IEEE floating-point exception mode bits (FE0 and FE1) together define whether floating-point exceptions are handled precisely, imprecisely, or whether they are taken at all. As shown in Table 4-5, if either FE0 or FE1 are set, theMPC750 treats exceptions as precise. MSR bits are guaranteed to be written to SRR1 when the first instruction of the exception handler is encountered. For further details, see Chapter 6, "Exceptions," of the Programming Environments Manual.
MOTOROLA
Chapter 4. Exceptions
4-9
Exception Processing
Table 4-5. IEEE Floating-Point Exception Mode Bits
FE0 FE1 0 0 1 1 0 1 0 1 Floating-point exceptions disabled Imprecise nonrecoverable. For this setting, the MPC750 operates in floating-point precise mode. Imprecise recoverable. For this setting, the MPC750 operates in floating-point precise mode. Floating-point precise mode Mode
4.3.1
Enabling and Disabling Exceptions
When a condition exists that may cause an exception to be generated, it must be determined whether the exception is enabled for that condition. * IEEE floating-point enabled exceptions (a type of program exception) are ignored when both MSR[FE0] and MSR[FE1] are cleared. If either bit is set, all IEEE enabled floating-point exceptions are taken and cause a program exception. Asynchronous, maskable exceptions (such as the external and decrementer interrupts) are enabled by setting MSR[EE]. When MSR[EE] = 0, recognition of these exception conditions is delayed. MSR[EE] is cleared automatically when an exception is taken to delay recognition of conditions causing those exceptions. A machine check exception can occur only if the machine check enable bit, MSR[ME], is set. If MSR[ME] is cleared, the processor goes directly into checkstop state when a machine check exception condition occurs. Individual machine check exceptions can be enabled and disabled through bits in the HID0 register, which is described in Table 4-8. System reset exceptions cannot be masked.
*
*
*
4.3.2
Steps for Exception Processing
After it is determined that the exception can be taken (by confirming that any instruction-caused exceptions occurring earlier in the instruction stream have been handled, and by confirming that the exception is enabled for the exception condition), the processor does the following: 1. SRR0 is loaded with an instruction address that depends on the type of exception. See the individual exception description for details about how this register is used for specific exceptions. 2. SRR1[1-4, 10-15] are loaded with information specific to the exception type. 3. SRR1[5-9, 16-31] are loaded with a copy of the corresponding MSR bits. Depending on the implementation, reserved bits may not be copied. 4. The MSR is set as described in Table 4-4. The new values take effect as the first instruction of the exception-handler routine is fetched.
4-10 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Exception Processing
Note that MSR[IR] and MSR[DR] are cleared for all exception types; therefore, address translation is disabled for both instruction fetches and data accesses beginning with the first instruction of the exception-handler routine. 5. Instruction fetch and execution resumes, using the new MSR value, at a location specific to the exception type. The location is determined by adding the exception's vector (see Table 4-2) to the base address determined by MSR[IP]. If IP is cleared, exceptions are vectored to the physical address 0x000n_nnnn. If IP is set, exceptions are vectored to the physical address 0xFFFn_nnnn. For a machine check exception that occurs when MSR[ME] = 0 (machine check exceptions are disabled), the checkstop state is entered (the machine stops executing instructions). See Section 4.5.2, "Machine Check Exception (0x00200)."
4.3.3
*
Setting MSR[RI]
In the machine check and system reset exceptions--If MSR[RI] is cleared, the exception is not recoverable. If it is set, the exception is recoverable with respect to the processor. In each exception handler--When enough state information has been saved that a machine check or system reset exception can reconstruct the previous state, set MSR[RI]. In each exception handler--Clear MSR[RI], set SRR0 and SRR1 appropriately, and then execute rfi. Note that the RI bit being set indicates that, with respect to the processor, enough processor state data remains valid for the processor to continue, but it does not guarantee that the interrupted process can resume.
An operating system may handle MSR[RI] as follows:
*
* *
4.3.4
Returning from an Exception Handler
The Return from Interrupt (rfi) instruction performs context synchronization by allowing previously-issued instructions to complete before returning to the interrupted process. In general, execution of the rfi instruction ensures the following: * All previous instructions have completed to a point where they can no longer cause an exception. If a previous instruction causes a direct-store interface error exception, the results must be determined before this instruction is executed. Previous instructions complete execution in the context (privilege, protection, and address translation) under which they were issued. The rfi instruction copies SRR1 bits back into the MSR. Instructions fetched after this instruction execute in the context established by this instruction.
* * *
MOTOROLA
Chapter 4. Exceptions
4-11
Process Switching
Program execution resumes at the instruction indicated by SRR0 For a complete description of context synchronization, refer to Chapter 6, "Exceptions," of the Programming Environments Manual.
4.4
*
Process Switching
The sync instruction orders the effects of instruction execution. All instructions previously initiated appear to have completed before the sync instruction completes, and no subsequent instructions appear to be initiated until the sync instruction completes. For an example showing use of sync, see Chapter 2, "PowerPC Register Set," of the Programming Environments Manual. The isync instruction waits for all previous instructions to complete and then discards any fetched instructions, causing subsequent instructions to be fetched (or refetched) from memory and to execute in the context (privilege, translation, and protection) established by the previous instructions. The stwcx. instruction clears any outstanding reservations, ensuring that an lwarx instruction in an old process is not paired with an stwcx. instruction in a new one.
The following instructions are useful for restoring proper context during process switching:
*
*
The operating system should set MSR[RI] as described in Section 4.3.3, "Setting MSR[RI]."
4.5
Exception Definitions
Table 4-6 shows all the types of exceptions that can occur with the MPC750 and MSR settings when the processor goes into supervisor mode due to an exception. Depending on the exception, certain of these bits are stored in SRR1 when an exception is taken.
Table 4-6. MSR Setting Due to Exception
MSR Bit Exception Type POW System reset Machine check DSI ISI External interrupt Alignment Program Floating-point unavailable Decrementer interrupt 0 0 0 0 0 0 0 0 0 ILE -- -- -- -- -- -- -- -- -- EE 0 0 0 0 0 0 0 0 0 PR 0 0 0 0 0 0 0 0 0 FP 0 0 0 0 0 0 0 0 0 ME FE0 SE -- 0 -- -- -- -- -- -- -- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BE 0 0 0 0 0 0 0 0 0 FE1 0 0 0 0 0 0 0 0 0 IP -- -- -- -- -- -- -- -- -- IR 0 0 0 0 0 0 0 0 0 DR PM RI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LE ILE ILE ILE ILE ILE ILE ILE ILE ILE
4-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Definitions
Table 4-6. MSR Setting Due to Exception (continued)
MSR Bit Exception Type POW System call Trace exception System management Performance monitor Thermal management 0 0 0 0 0 ILE -- -- -- -- -- EE 0 0 0 0 0 PR 0 0 0 0 0 FP 0 0 0 0 0 ME FE0 SE -- -- -- -- -- 0 0 0 0 0 0 0 0 0 0 BE 0 0 0 0 0 FE1 0 0 0 0 0 IP -- -- -- -- -- IR 0 0 0 0 0 DR PM RI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LE ILE ILE ILE ILE ILE
0 Bit is cleared. ILE Bit is copied from the MSR[ILE]. -- Bit is not altered Reserved bits are read as if written as 0.
The setting of the exception prefix bit (IP) determines how exceptions are vectored. If the bit is cleared, exceptions are vectored to the physical address 0x000n_nnnn (where nnnnn is the vector offset); if IP is set, exceptions are vectored to physical address 0xFFFn_nnnn. Table 4-2 shows the exception vector offset of the first instruction of the exception handler routine for each exception type.
4.5.1
System Reset Exception (0x00100)
The MPC750 implements the system reset exception as defined in the PowerPC architecture (OEA). The system reset exception is a nonmaskable, asynchronous exception signaled to the processor through the assertion of system-defined signals. In the MPC750, the exception is signaled by the assertion of either the SRESET or HRESET inputs, described more fully in Chapter 7, "Signal Descriptions." Table 4-7 lists register settings when a system reset exception is taken.
Table 4-7. System Reset Exception--Register Settings
Register SRR0 SRR1 Setting Description Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. 0 Loaded with equivalent MSR bits 1-4 Cleared 5-9 Loaded with equivalent MSR bits 10-15 Cleared 16-31 Loaded with equivalent MSR bits Note that if the processor state is corrupted to the extent that execution cannot resume reliably, MSR[RI] (SRR1[30]) is cleared. POW ILE EE PR 0 -- 0 0 FP ME FE0 SE 0 -- 0 0 BE FE1 IP IR 0 0 -- 0 DR PM RI LE 0 0 0 Set to value of ILE
MSR
MOTOROLA
Chapter 4. Exceptions
4-13
Exception Definitions
If SRESET is asserted, the processor is first put in a recoverable state. To do this, the MPC750 allows any instruction at the point of completion to either complete or take an exception, blocks completion of any following instructions and allows the completion queue to drain. The state before the exception occurred is then saved as specified in the PowerPC architecture and instruction fetching begins at the system reset interrupt vector offset, 0x00100. The vector address on a soft reset depends on the setting of MSR[IP] (either 0x0000_0100 or 0xFFF0_0100). Soft resets are third in priority, after hard reset and machine check. This exception is recoverable provided attaining a recoverable state does not generate a machine check. SRESET is an edge-sensitive signal that can be asserted and deasserted asynchronously, provided the minimum pulse width specified in the hardware specifications is met. Asserting SRESET causes the MPC750 to take a system reset exception. This exception modifies the MSR, SRR0, and SRR1, as described in the Programming Environments Manual. Unlike hard reset, soft reset does not directly affect the states of output signals. Attempts to use SRESET during a hard reset sequence or while the JTAG logic is non-idle cause unpredictable results. A hard reset is initiated by asserting HRESET. Hard reset is used primarily for power-on reset (POR) (in which case TRST must also be asserted), but can also be used to restart a running processor. The HRESET signal must be asserted during power up and must remain asserted for a period that allows the PLL to achieve lock and the internal logic to be reset. This period is specified in the hardware specifications. The MPC750 internal state after the hard reset interval is defined in Table 2-19. If HRESET is asserted for less than this amount of time, the results are not predictable. If HRESET is asserted during normal operation, all operations cease and the machine state is lost. The MPC750 implements HID0[NHR], which helps software distinguish a hard reset from a soft reset. Because this bit is cleared by a hard reset, but not by a soft reset, software can set this bit after a hard reset and tell whether a subsequent reset is a hard or soft reset by examining whether this bit is still set. See Section 2.1.2.2, "Hardware Implementation-Dependent Register 0."
4.5.2
Machine Check Exception (0x00200)
The MPC750 implements the machine check exception as defined in the PowerPC architecture (OEA). It conditionally initiates a machine check exception after an address or data parity error occurred on the bus or in L2 cache, after receiving a qualified transfer error acknowledge (TEA) indication on the MPC750 bus, or after the machine check interrupt (MCP) signal had been asserted. As defined in the OEA, the exception is not taken if MSR[ME] is cleared, in which case the processor enters checkstop state.
4-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Definitions
Certain machine check conditions can be enabled and disabled using HID0 bits, as described in Table 4-8.
Table 4-8. HID0 Machine Check Enable Bits
Bits 0 Name Function
EMCP Enable MCP. The primary purpose of this bit is to mask out further machine check exceptions caused by assertion of MCP, similar to how MSR[EE] can mask external interrupts. 0 Masks MCP. Asserting MCP does not generate a machine check exception or a checkstop. 1 Asserting MCP causes a checkstop if MSR[ME] = 0 or a machine check exception if MSR[ME] = 1. DBP Enable/disable 60x bus address and data parity generation. 0 If address or data parity is not used by the system and the respective parity checking is disabled (HID0[EBA] or HID0[EBD] = 0), input receivers for those signals are disabled, do not require pull-up resistors, and therefore should be left unconnected. If all parity generation is disabled, all parity checking should also be disabled and parity signals need not be connected. 1 Parity generation is enabled. Enable/disable 60x bus address parity checking. 0 Prevents address parity checking. 1 Allows a address parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if MSR[ME] = 1. EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. Enable 60x bus data parity checking 0 Parity checking is disabled. 1 Allows a data parity error to cause a checkstop if MSR[ME] = 0 or a machine check exception if MSR[ME] = 1. EBA and EBD allow the processor to operate with memory subsystems that do not generate parity. Not hard reset (software use only) 0 A hard reset occurred if software had previously set this bit 1 A hard reset has not occurred.
1
2
EBA
3
EBD
15
NHR
A TEA indication on the bus can result from any load or store operation initiated by the processor. In general, TEA is expected to be used by a memory controller to indicate that a memory parity error or an uncorrectable memory ECC error has occurred. Note that the resulting machine check exception is imprecise and unordered with respect to the instruction that originated the bus operation. If MSR[ME] and the appropriate HID0 bits are set, the exception is recognized and handled; otherwise, the processor generates an internal checkstop condition. When a processor is in checkstop state, instruction processing is suspended and generally cannot continue without restarting the processor. Note that many conditions may lead to the checkstop condition; the disabled machine check exception is only one of these. A machine check exception may result from referencing a nonexistent physical address, either directly (with MSR[DR] = 0) or through an invalid translation. If a dcbz instruction introduces a block into the cache associated with a nonexistent physical address, a machine check exception can be delayed until an attempt is made to store that block to main memory. Not all processors that implement the PowerPC architecture provide the same level of error checking. Checkstop sources are implementation-dependent.
MOTOROLA
Chapter 4. Exceptions
4-15
Exception Definitions
Machine check exceptions are enabled when MSR[ME] = 1; this is described in the following section, Section 4.5.2.1, "Machine Check Exception Enabled (MSR[ME] = 1)." If MSR[ME] = 0 and a machine check occurs, the processor enters the checkstop state. Checkstop state is described in Section 4.5.2.2, "Checkstop State (MSR[ME] = 0)."
4.5.2.1
Machine Check Exception Enabled (MSR[ME] = 1)
Machine check exceptions are enabled when MSR[ME] = 1. When a machine check exception is taken, registers are updated as shown in Table 4-9.
Table 4-9. Machine Check Exception--Register Settings
Register SRR0 SRR1 Setting Description On a best-effort basis the MPC750 can set this to an EA of some instruction that was executing or about to be executing when the machine check condition occurred. 0-10 11 12 13 14 15 16-31 POW ILE EE PR Cleared Set when an L2 data cache parity error is detected, otherwise zero Set when MCP signal is asserted, otherwise zero Set when TEA signal is asserted, otherwise zero Set when a data bus parity error is detected, otherwise zero Set when an address bus parity error is detected, otherwise zero MSR[16-31] 0 -- 0 0 FP ME FE0 SE 0 0 0 0 BE FE1 IP IR 0 0 -- 0 DR PM RI LE 0 0 0 Set to value of ILE
MSR
Note that to handle another machine check exception, the exception handler should set MSR[ME] as soon as it is practical after a machine check exception is taken. Otherwise, subsequent machine check exceptions cause the processor to enter the checkstop state.
The machine check exception is usually unrecoverable in the sense that execution cannot resume in the context that existed before the exception. If the condition that caused the machine check does not otherwise prevent continued execution, MSR[ME] is set to allow the processor to continue execution at the machine check exception vector address. Typically, earlier processes cannot resume; however, operating systems can use the machine check exception handler to try to identify and log the cause of the machine check condition. When a machine check exception is taken, instruction fetching resumes at offset 0x00200 from the physical base address indicated by MSR[IP].
4.5.2.2
Checkstop State (MSR[ME] = 0)
If MSR[ME] = 0 and a machine check occurs, the processor enters the checkstop state. When a processor is in checkstop state, instruction processing is suspended and generally cannot resume without the processor being reset. The contents of all latches are frozen within two cycles upon entering checkstop state.
4-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Definitions
4.5.3
DSI Exception (0x00300)
A DSI exception occurs when no higher priority exception exists and an error condition related to a data memory access occurs. The DSI exception is implemented as it is defined in the PowerPC architecture (OEA). In case of a TLB miss for a load, store, or cache operation, a DSI exception is taken if the resulting hardware table search causes a page fault. On the MPC750, a DSI exception is taken when a load or store is attempted to a direct-store segment (SR[T] = 1). In the MPC750, a floating-point load or store to a direct-store segment causes a DSI exception rather than an alignment exception, as specified by the PowerPC architecture. The MPC750 also implements the data address breakpoint facility, which is defined as optional in the PowerPC architecture and is supported by the optional data address breakpoint register (DABR). Although the architecture does not strictly prescribe how this facility must be implemented, the MPC750 follows the recommendations provided by the architecture and described in the Chapter 2, "Programming Model," and Chapter 6 "Exceptions," in the Programming Environments Manual.
4.5.4
ISI Exception (0x00400)
An ISI exception occurs when no higher priority exception exists and an attempt to fetch the next instruction fails. This exception is implemented as it is defined by the PowerPC architecture (OEA), and is taken for the following conditions: * * * * * The effective address cannot be translated. The fetch access is to a no-execute segment (SR[N] = 1). The fetch access is to guarded storage and MSR[IR] = 1. The fetch access is to a segment for which SR[T] is set. The fetch access violates memory protection.
When an ISI exception is taken, instruction fetching resumes at offset 0x00400 from the physical base address indicated by MSR[IP].
4.5.5
External Interrupt Exception (0x00500)
An external interrupt is signaled to the processor by the assertion of the external interrupt signal (INT). The INT signal is expected to remain asserted until the MPC750 takes the external interrupt exception. If INT is negated early, recognition of the interrupt request is not guaranteed. After the MPC750 begins execution of the external interrupt handler, the system can safely negate the INT. When the MPC750 detects assertion of INT, it stops dispatching and waits for all pending instructions to complete. This allows any instructions in progress that need to take an exception to do so before the external interrupt is taken.
MOTOROLA
Chapter 4. Exceptions
4-17
Exception Definitions
After all instructions have vacated the completion buffer, the MPC750 takes the external interrupt exception as defined in the PowerPC architecture (OEA). An external interrupt may be delayed by other higher priority exceptions or if MSR[EE] is cleared when the exception occurs. Register settings for this exception are described in Chapter 6, "Exceptions," in the Programming Environments Manual. When an external interrupt exception is taken, instruction fetching resumes at offset 0x00500 from the physical base address indicated by MSR[IP].
4.5.6
Alignment Exception (0x00600)
The MPC750 implements the alignment exception as defined by the PowerPC architecture (OEA). An alignment exception is initiated when any of the following occurs: * * * * * * The operand of a floating-point load or store is not word-aligned. The operand of lmw, stmw, lwarx, or stwcx. is not word-aligned. The operand of dcbz is in a page that is write-through or cache-inhibited. An attempt is made to execute dcbz when the data cache is disabled. An eciwx or ecowx is not word-aligned A multiple or string access is attempted with MSR[LE] set
Note that in the MPC750, a floating-point load or store to a direct-store segment causes a DSI exception rather than an alignment exception, as specified by the PowerPC architecture. For more information, see 4.5.3, "DSI Exception (0x00300)."
4.5.7
Program Exception (0x00700)
The MPC750 implements the program exception as it is defined by the PowerPC architecture (OEA). A program exception occurs when no higher priority exception exists and one or more of the exception conditions defined in the OEA occur. The MPC750 invokes the system illegal instruction program exception when it detects any instruction from the illegal instruction class. The MPC750 fully decodes the SPR field of the instruction. If an undefined SPR is specified, a program exception is taken. The UISA defines mtspr and mfspr with the record bit (Rc) set as causing a program exception or giving a boundedly-undefined result. In the MPC750, the appropriate condition register (CR) should be treated as undefined. Likewise, the PowerPC architecture states that the Floating Compared Unordered (fcmpu) or Floating Compared Ordered (fcmpo) instruction with the record bit set can either cause a program exception or provide a boundedly-undefined result. In the MPC750, an the BF field in an instruction encoding for these cases is considered undefined.
4-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Definitions
The MPC750 does not support either of the two floating-point imprecise modes supported by the PowerPC architecture. Unless exceptions are disabled (MSR[FE0] = MSR[FE1] = 0), all floating-point exceptions are treated as precise. When a program exception is taken, instruction fetching resumes at offset 0x00700 from the physical base address indicated by MSR[IP]. Chapter 6, "Exceptions," in the Programming Environments Manual describes register settings for this exception.
4.5.8
Floating-Point Unavailable Exception (0x00800)
The floating-point unavailable exception is implemented as defined in the PowerPC architecture. A floating-point unavailable exception occurs when no higher priority exception exists, an attempt is made to execute a floating-point instruction (including floating-point load, store, or move instructions), and the floating-point available bit in the MSR is disabled, (MSR[FP] = 0). Register settings for this exception are described in Chapter 6, "Exceptions," in the Programming Environments Manual. When a floating-point unavailable exception is taken, instruction fetching resumes at offset 0x00800 from the physical base address indicated by MSR[IP].
4.5.9
Decrementer Exception (0x00900)
The decrementer exception is implemented in the MPC750 as it is defined by the PowerPC architecture. The decrementer exception occurs when no higher priority exception exists, a decrementer exception condition occurs (for example, the decrementer register has completed decrementing), and MSR[EE] = 1. In the MPC750, the decrementer register is decremented at one fourth the bus clock rate. Register settings for this exception are described in Chapter 6, "Exceptions," in the Programming Environments Manual. When a decrementer exception is taken, instruction fetching resumes at offset 0x00900 from the physical base address indicated by MSR[IP].
4.5.10 System Call Exception (0x00C00)
A system call exception occurs when a System Call (sc) instruction is executed. In the MPC750, the system call exception is implemented as it is defined in the PowerPC architecture. Register settings for this exception are described in Chapter 6, "Exceptions," in the Programming Environments Manual. When a system call exception is taken, instruction fetching resumes at offset 0x00C00 from the physical base address indicated by MSR[IP].
4.5.11 Trace Exception (0x00D00)
The trace exception is taken if MSR[SE] = 1 or if MSR[BE] = 1 and the currently completing instruction is a branch. Each instruction considered during trace mode
MOTOROLA Chapter 4. Exceptions 4-19
Exception Definitions
completes before a trace exception is taken. Register settings for this exception are described in Chapter 6, "Exceptions," in the Programming Environments Manual. Implementation Note--The MPC750 processor diverges from the PowerPC architecture in that it does not take trace exceptions on the isync instruction. When a trace exception is taken, instruction fetching resumes as offset 0x00D00 from the base address indicated by MSR[IP].
4.5.12 Floating-Point Assist Exception (0x00E00)
The optional floating-point assist exception defined by the PowerPC architecture is not implemented in the MPC750.
4.5.13 Performance Monitor Interrupt (0x00F00)
The MPC750 microprocessor provides a performance monitor facility to monitor and count predefined events such as processor clocks, misses in either the instruction cache or the data cache, instructions dispatched to a particular execution unit, mispredicted branches, and other occurrences. The count of such events can be used to trigger the performance monitor exception. The performance monitor facility is not defined by the PowerPC architecture. The performance monitor can be used for the following: * To increase system performance with efficient software, especially in a multiprocessing system. Memory hierarchy behavior must be monitored and studied to develop algorithms that schedule tasks (and perhaps partition them) and that structure and distribute data optimally. To help system developers bring up and debug their systems. The performance monitor counter registers (PMC1-PMC4) are used to record the number of times a certain event has occurred. UPMC1-UPMC4 provide user-level read access to these registers. The monitor mode control registers (MMCR0-MMCR1) are used to enable various performance monitor interrupt functions. UMMCR0-UMMCR1 provide user-level read access to these registers. The sampled instruction address register (SIA) contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition. The USIA register provides user-level read access to the SIA.
* *
The performance monitor uses the following SPRs:
*
*
4-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Definitions
Table 4-10 lists register settings when a performance monitor interrupt exception is taken.
Table 4-10. Performance Monitor Interrupt Exception--Register Settings
Register SRR0 SRR1 Setting Description Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. 0 1-4 5-9 10-15 16-31 POW ILE EE PR Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits 0 -- 0 0 FP ME FE0 SE 0 -- 0 0 BE FE1 IP IR 0 0 -- 0 DR PM RI LE 0 0 0 Set to value of ILE
MSR
As with other PowerPC exceptions, the performance monitor interrupt follows the normal PowerPC exception model with a defined exception vector offset (0x00F00). The priority of the performance monitor interrupt lies between the external interrupt and the decrementer interrupt (see Table 4-3). The contents of the SIA are described in Section 2.1.2.4, "Performance Monitor Registers." The performance monitor is described in Chapter 11, "Performance Monitor."
4.5.14 Instruction Address Breakpoint Exception (0x01300)
An instruction address breakpoint interrupt occurs when the following conditions are met: * The instruction breakpoint address IABR[0-29] matches EA[0-29] of the next instruction to complete in program order. The instruction that triggers the instruction address breakpoint exception is not executed before the exception handler is invoked. The translation enable bit (IABR[TE]) matches MSR[IR]. The breakpoint enable bit (IABR[BE]) is set. The address match is also reported to the JTAG/COP block, which may subsequently generate a soft or hard reset. The instruction tagged with the match does not complete before the breakpoint exception is taken.
* *
Table 4-11 lists register settings when an instruction address breakpoint exception is taken.
Table 4-11. Instruction Address Breakpoint Exception-- Register Settings
Register SRR0 Setting Description Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present.
MOTOROLA
Chapter 4. Exceptions
4-21
Exception Definitions
Table 4-11. Instruction Address Breakpoint Exception-- Register Settings (continued)
SRR1 0 1-4 5-9 10-15 16-31 POW ILE EE PR Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits 0 -- 0 0 FP ME FE0 SE 0 -- 0 0 BE FE1 IP IR 0 0 -- 0 DR PM RI LE 0 0 0 Set to value of ILE
MSR
The MPC750 requires that an mtspr to the IABR be followed by a context-synchronizing instruction. The MPC750 cannot generate a breakpoint response for that context-synchronizing instruction if the breakpoint is enabled by the mtspr(IABR) immediately preceding it. The MPC750 also cannot block a breakpoint response on the context-synchronizing instruction if the breakpoint was disabled by the mtspr(IABR) instruction immediately preceding it. The format of the IABR register is shown in Section 2.1.2.1, "Instruction Address Breakpoint Register (IABR)." When an instruction address breakpoint exception is taken, instruction fetching resumes as offset 0x01300 from the base address indicated by MSR[IP].
4.5.15 System Management Interrupt (0x01400)
The MPC750 implements a system management interrupt exception, which is not defined by the PowerPC architecture. The system management exception is very similar to the external interrupt exception and is particularly useful in implementing the nap mode. It has priority over an external interrupt (see Table 4-3), and it uses a different vector in the exception table (offset 0x01400). Table 4-12 lists register settings when a system management interrupt exception is taken.
Table 4-12. System Management Interrupt Exception--Register Settings
Register SRR0 SRR1 Setting Description Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. 0 1-4 5-9 10-15 16-31 POW ILE EE PR Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits 0 -- 0 0 FP ME FE0 SE 0 -- 0 0 BE FE1 IP IR 0 0 -- 0 DR PM RI LE 0 0 0 Set to value of ILE
MSR
4-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Exception Definitions
Like the external interrupt, a system management interrupt is signaled to the MPC750 by the assertion of an input signal. The system management interrupt signal (SMI) is expected to remain asserted until the interrupt is taken. If SMI is negated early, recognition of the interrupt request is not guaranteed. After the MPC750 begins execution of the system management interrupt handler, the system can safely negate SMI. After the assertion of SMI is detected, the MPC750 stops dispatching instructions and waits for all pending instructions to complete. This allows any instructions in progress that need to take an exception to do so before the system management interrupt is taken. When a system management interrupt exception is taken, instruction fetching resumes as offset 0x01400 from the base address indicated by MSR[IP].
4.5.16 Thermal Management Interrupt Exception (0x01700)
A thermal management interrupt is generated when the junction temperature crosses a threshold programmed in either THRM1 or THRM2. The exception is enabled by the TIE bit of either THRM1 or THRM2, and can be masked by setting MSR[EE]. Table 4-13 lists register settings when a thermal management interrupt exception is taken.
Table 4-13. Thermal Management Interrupt Exception--Register Settings
Register SRR0 SRR1 Setting Description Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. 0 1-4 5-9 10-15 16-31 POW ILE EE PR Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits Cleared Loaded with equivalent MSR bits 0 -- 0 0 FP ME FE0 SE 0 -- 0 0 BE FE1 IP IR 0 0 -- 0 DR PM RI LE 0 0 0 Set to value of ILE
MSR
The thermal management interrupt is similar to the system management and external interrupts. The MPC750 requires the next instruction in program order to complete or take an exception, blocks completion of any following instructions, and allows the completed store queue to drain. Any exceptions encountered in this process are taken first and the thermal management interrupt exception is delayed until a recoverable halt is achieved, at which point the MPC750 saves the machine state, as shown in Table 4-13. When a thermal management interrupt exception is taken, instruction fetching resumes as offset 0x01700 from the base address indicated by MSR[IP]. Chapter 10, "Power and Thermal Management," gives details about thermal management.
MOTOROLA
Chapter 4. Exceptions
4-23
Exception Definitions
4-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 5 Memory Management
This chapter describes the MPC750 microprocessor's implementation of the memory management unit (MMU) specifications provided by the operating environment architecture (OEA) for processors that implement the PowerPC architecture. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor." The primary function of the MMU in a processor of this family is the translation of logical (effective) addresses to physical addresses (referred to as real addresses in the architecture specification) for memory accesses and I/O accesses (I/O accesses are assumed to be memory-mapped). In addition, the MMU provides access protection on a segment, block, or page basis. This chapter describes the specific hardware used to implement the MMU model of the OEA in the MPC750. Refer to Chapter 7, "Memory Management," in the Programming Environments Manual for a complete description of the conceptual model. Note that the MPC750 does not implement the optional direct-store facility and it is not likely to be supported in future devices. Two general types of memory accesses generated by processors that implement the PowerPC architecture require address translation--instruction accesses and data accesses generated by load and store instructions. Generally, the address translation mechanism is defined in terms of the segment descriptors and page tables defined by the PowerPC architecture for locating the effective-to-physical address mapping for memory accesses. The segment information translates the effective address to an interim virtual address, and the page table information translates the interim virtual address to a physical address. The segment descriptors, used to generate the interim virtual addresses, are stored as on-chip segment registers on 32-bit implementations (such as the MPC750). In addition, two translation lookaside buffers (TLBs) are implemented on the MPC750 to keep recently-used page address translations on-chip. Although the PowerPC OEA describes one MMU (conceptually), the MPC750 hardware maintains separate TLBs and table search resources for instruction and data accesses that can be performed independently (and simultaneously). Therefore, the MPC750 is described as having two MMUs, one for instruction accesses (IMMU) and one for data accesses (DMMU). The block address translation (BAT) mechanism is a software-controlled array that stores the available block address translations on-chip. BAT array entries are implemented as pairs
MOTOROLA Chapter 5. Memory Management 5-1
MMU Overview
of BAT registers that are accessible as supervisor special-purpose registers (SPRs). There are separate instruction and data BAT mechanisms, and in the MPC750, they reside in the instruction and data MMUs, respectively. The MMUs, together with the exception processing mechanism, provide the necessary support for the operating system to implement a paged virtual memory environment and for enforcing protection of designated memory areas. Exception processing is described in Chapter 4, "Exceptions." Section 4.3, "Exception Processing," describes the MSR, which controls some of the critical functionality of the MMUs.
5.1
MMU Overview
The MPC750 implements the memory management specification of the PowerPC OEA for 32-bit implementations. Thus, it provides 4 Gbytes of effective address space accessible to supervisor and user programs, with a 4-Kbyte page size and 256-Mbyte segment size. In addition, the MMUs of 32-bit processors of this family use an interim virtual address (52 bits) and hashed page tables in the generation of 32-bit physical addresses. These processors also have a BAT mechanism for mapping large blocks of memory. Block sizes range from 128 Kbyte to 256 Mbyte and are software-programmable. Basic features of the MPC750 MMU implementation defined by the OEA are as follows: * * Support for real addressing mode--Effective-to-physical address translation can be disabled separately for data and instruction accesses. Block address translation--Each of the BAT array entries (four IBAT entries and four DBAT entries) provides a mechanism for translating blocks as large as 256 Mbytes from the 32-bit effective address space into the physical memory space. This can be used for translating large address ranges whose mappings do not change frequently. Segmented address translation--The 32-bit effective address is extended to a 52-bit virtual address by substituting 24 bits of upper address bits from the segment register, for the 4 upper bits of the EA, which are used as an index into the segment register file. This 52-bit virtual address space is divided into 4-Kbyte pages, each of which can be mapped to a physical page.
*
The MPC750 also provides the following features that are not required by the PowerPC architecture: * Separate translation lookaside buffers (TLBs)--The 128-entry, two-way set-associative ITLBs and DTLBs keep recently-used page address translations on-chip. Table search operations performed in hardware--The 52-bit virtual address is formed and the MMU attempts to fetch the PTE, which contains the physical address, from the appropriate TLB on-chip. If the translation is not found in a TLB
*
5-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MMU Overview
*
(that is, a TLB miss occurs), the hardware performs a table search operation (using a hashing function) to search for the PTE. TLB invalidation--The MPC750 implements the optional TLB Invalidate Entry (tlbie) and TLB Synchronize (tlbsync) instructions, which can be used to invalidate TLB entries. For more information on the tlbie and tlbsync instructions, see Section 5.4.3.2, "TLB Invalidation."
Table 5-1 summarizes the MPC750 MMU features, including those defined by the PowerPC architecture (OEA) for 32-bit processors and those specific to the MPC750.
Table 5-1. MMU Feature Summary
Feature Category Address ranges Architecturally Defined/ MPC750-Specific Architecturally defined Feature 232 bytes of effective address 252 bytes of virtual address 232 bytes of physical address Page size Segment size Block address translation Memory protection Architecturally defined Architecturally defined Architecturally defined 4 Kbytes 256 Mbytes Range of 128 Kbyte-256 Mbyte sizes Implemented with IBAT and DBAT registers in BAT array Architecturally defined Segments selectable as no-execute Pages selectable as user/supervisor and read-only or guarded Blocks selectable as user/supervisor and read-only or guarded Page history Page address translation TLBs Architecturally defined Architecturally defined Referenced and changed bits defined and maintained Translations stored as PTEs in hashed page tables in memory Page table size determined by mask in SDR1 register Architecturally defined MPC750-specific Instructions for maintaining TLBs (tlbie and tlbsync instructions in MPC750) 128-entry, two-way set associative ITLB 128-entry, two-way set associative DTLB LRU replacement algorithm Stored as segment registers on-chip (two identical copies maintained) The MPC750 performs the table search operation in hardware.
Segment descriptors Page table search support
Architecturally defined MPC750-specific
5.1.1
Memory Addressing
A program references memory using the effective (logical) address computed by the processor when it executes a load, store, branch, or cache instruction, and when it fetches the next instruction. The effective address is translated to a physical address according to the procedures described in Chapter 7, "Memory Management," in the Programming
MOTOROLA Chapter 5. Memory Management 5-3
MMU Overview
Environments Manual, augmented with information in this chapter. The memory subsystem uses the physical address for the access. For a complete discussion of effective address calculation, see Section 2.3.2.3, "Effective Address Calculation."
5.1.2
MMU Organization
Figure 5-1 shows the conceptual organization of a PowerPC MMU in a 32-bit implementation; note that it does not describe the specific hardware used to implement the memory management function for a particular processor. Processors may optionally implement on-chip TLBs, hardware support for the automatic search of the page tables for PTEs, and other hardware features (invisible to the system software) not shown. The MPC750 maintains two on-chip TLBs with the following characteristics: * * * * 128 entries, two-way set associative (64 x 2), LRU replacement Data TLB supports the DMMU; instruction TLB supports the IMMU Hardware TLB update Hardware update of referenced (R) and changed (C) bits in the translation table
In the event of a TLB miss, the hardware attempts to load the TLB based on the results of a translation table search operation. Figure 5-2 and Figure 5-3 show the conceptual organization of the MPC750 instruction and data MMUs, respectively. The instruction addresses shown in Figure 5-2 are generated by the processor for sequential instruction fetches and addresses that correspond to a change of program flow. Data addresses shown in Figure 5-3 are generated by load, store, and cache instructions. As shown in the figures, after an address is generated, the high-order bits of the effective address, EA[0-19] (or a smaller set of address bits, EA[0-n], in the cases of blocks), are translated into physical address bits PA[0-19]. The low-order address bits, A[20-31], are untranslated and are therefore identical for both effective and physical addresses. After translating the address, the MMUs pass the resulting 32-bit physical address to the memory subsystem. The MMUs record whether the translation is for an instruction or data access, whether the processor is in user or supervisor mode and, for data accesses, whether the access is a load or a store operation. The MMUs use this information to appropriately direct the address translation and to enforce the protection hierarchy programmed by the operating system. Section 4.3, "Exception Processing," describes the MSR, which controls some of the critical functionality of the MMUs. The figures show how address bits A[20-26] index into the on-chip instruction and data caches to select a cache set. The remaining physical address bits are then compared with the tag fields (comprised of bits PA[0-19]) of the two selected cache blocks to determine if
5-4 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MMU Overview
a cache hit has occurred. In the case of a cache miss on the MPC750, the instruction or data access is then forwarded to the L2 interface tags to check for an L2 cache hit. In case of a miss (and in all cases of an on-chip cache miss on the MPC740) the access is forwarded to the bus interface unit which initiates an external memory access.
Data Accesses EA[0-19] Instruction Accesses EA[0-19]
A[20-31]
MMU (32-Bit)
X EA[4-19] EA[15-19] EA[0-14] Segment Registers * * *
EA[0-3]
0
IBAT0U IBAT0L * * IBAT3U IBAT3L X
15 Upper 24-Bits of Virtual Address
EA[15-19]
EA[0-14] On-Chip TLBs (Optional)
DBAT0U DBAT0L * * DBAT3U DBAT3L X PA[0-14] PA[15-19]
BAT Hit
Page Table Search Logic (Optional)
SDR1
SPR 25
X PA[0-19] A[20-31]
Optional
PA[0-31]
Figure 5-1. MMU Conceptual Block Diagram--32-Bit Implementations
MOTOROLA
Chapter 5. Memory Management
5-5
MMU Overview
Instruction Unit BPU
A[20-31]
EA[0-19] EA[0-3] EA[0-19] 0 Select 15 EA[4-19] ITLB Segment Registers * * * EA[0-14]
IMMU
IBAT Array IBAT0U IBAT0L * * IBAT3U IBAT3L
7 0 0 Select A[20-26] 127 PA[0-19] 63 Page Table Search Logic X PA[0-19] SDR1 SPR25 0 7 Tag
I Cache
Compare Compare Compare
I Cache Hit/Miss
PA[0-31]
Figure 5-2. MPC750 Microprocessor IMMU Block Diagram
5-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MMU Overview
Load/Store Unit
A[20-31]
EA[0-19] EA[0-3] EA[0-19] 0 Select 15 EA[4-19] DTLB Segment Registers * * * EA[0-14]
DMMU
DBAT Array DBAT0U DBAT0L * * DBAT3U DBAT3L
7 0 0 Select A[20-26] 127 PA[0-19] 63 Page Table Search Logic X PA[0-19] SDR1 SPR 25 0 7 Tag
D Cache
Compare Compare Compare
D Cache Hit/Miss
PA[0-31]
Figure 5-3. MPC750 Microprocessor DMMU Block Diagram
5.1.3
Address Translation Mechanisms
Processors that implement the PowerPC architecture support the following three types of address translation: * * Page address translation--translates the page frame address for a 4-Kbyte page size Block address translation--translates the block number for blocks that range in size from 128 Kbytes to 256 Mbytes.
MOTOROLA
Chapter 5. Memory Management
5-7
MMU Overview
*
Real addressing mode address translation--when address translation is disabled, the physical address is identical to the effective address.
Figure 5-4 shows the three address translation mechanisms provided by the MMUs. The segment descriptors shown in the figure control the page address translation mechanism. When an access uses page address translation, the appropriate segment descriptor is required. In 32-bit implementations, the appropriate segment descriptor is selected from the 16 on-chip segment registers by the four highest-order effective address bits. A control bit in the corresponding segment descriptor then determines if the access is to memory (memory-mapped) or to the direct-store interface space. Note that the direct-store interface was present in the architecture only for compatibility with existing I/O devices that used this interface. However, it is being removed from the architecture, and the MPC750 does not support it. When an access is determined to be to the direct-store interface space, the MPC750 takes a DSI exception if it is a data access (see Section 4.5.3, "DSI Exception (0x00300)"), and takes an ISI exception if it is an instruction access (see Section 4.5.4, "ISI Exception (0x00400)"). For memory accesses translated by a segment descriptor, the interim virtual address is generated using the information in the segment descriptor. Page address translation corresponds to the conversion of this virtual address into the 32-bit physical address used by the memory subsystem. In most cases, the physical address for the page resides in an on-chip TLB and is available for quick access. However, if the page address translation misses in the on-chip TLB, the MMU causes a search of the page tables in memory (using the virtual address information and a hashing function) to locate the required physical address. Because blocks are larger than pages, there are fewer upper-order effective address bits to be translated into physical address bits (more low-order address bits (at least 17) are untranslated to form the offset into a block) for block address translation. Also, instead of segment descriptors and a TLB, block address translations use the on-chip BAT registers as a BAT array. If an effective address matches the corresponding field of a BAT register, the information in the BAT register is used to generate the physical address; in this case, the results of the page translation (occurring in parallel) are ignored.
5-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MMU Overview
0 Effective Address
Address Translation Disabled (MSR[IR] = 0, or MSR[DR] = 0)
Segment Descriptor Located (T = 1) (T = 0) Page Address Translation 0 Virtual Address
Match with BAT Registers
Block Address Translation (See Section 5.3)
Direct-Store Interface Translation Real Addressing Mode Effective Address = Physical Address (See Section 5.2)
Look Up in Page Table DSI/ISI Exception
0 Physical Address
0 Physical Address
0 Physical Address
Figure 5-4. Address Translation Types
When the processor generates an access, and the corresponding address translation enable bit in MSR is cleared, the resulting physical address is identical to the effective address and all other translation mechanisms are ignored. Instruction address translation and data address translation are enabled by setting MSR[IR] and MSR[DR], respectively.
5.1.4
Memory Protection Facilities
In addition to the translation of effective addresses to physical addresses, the MMUs provide access protection of supervisor areas from user access and can designate areas of memory as read-only as well as no-execute or guarded. Table 5-2 shows the protection options supported by the MMUs for pages.
MOTOROLA
Chapter 5. Memory Management
5-9
MMU Overview
Table 5-2. Access Protection Options for Pages
User Read Option I-Fetch Supervisor-only Supervisor-only-no-execute Supervisor-write-only Supervisor-write-only-no-execute Both (user/supervisor) Both (user-/supervisor) no-execute Both (user-/supervisor) read-only Both (user/supervisor) read-only-no-execute Access permitted -- Protection violation -- -- -- -- -- Data -- -- -- -- -- -- -- -- User Write I-Fetch -- -- -- -- Data Supervisor Read Supervisor Write -- --
The no-execute option provided in the segment register lets the operating system program determine whether instructions can be fetched from an area of memory. The remaining options are enforced based on a combination of information in the segment descriptor and the page table entry. Thus, the supervisor-only option allows only read and write operations generated while the processor is operating in supervisor mode (MSR[PR] = 0) to access the page. User accesses that map into a supervisor-only page cause an exception. Finally, a facility in the VEA and OEA allows pages or blocks to be designated as guarded, preventing out-of-order accesses that may cause undesired side effects. For example, areas of the memory map used to control I/O devices can be marked as guarded so accesses do not occur unless they are explicitly required by the program. For more information on memory protection, see "Memory Protection Facilities," in Chapter 7, "Memory Management," in the Programming Environments Manual.
5.1.5
Page History Information
The MMUs of these processors also define referenced (R) and changed (C) bits in the page address translation mechanism that can be used as history information relevant to the page. The operating system can use these bits to determine which areas of memory to write back to disk when new pages must be allocated in main memory. While these bits are initially programmed by the operating system into the page table, the architecture specifies that they can be maintained either by the processor hardware (automatically) or by some software-assist mechanism. Implementation Note--When loading the TLB, the MPC750 checks the state of the changed and referenced bits for the matched PTE. If the referenced bit is not set and the table search operation is initially caused by a load operation or by an instruction fetch, the
5-10 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MMU Overview
MPC750 automatically sets the referenced bit in the translation table. Similarly, if the table search operation is caused by a store operation and either the referenced bit or the changed bit is not set, the hardware automatically sets both bits in the translation table. In addition, when the address translation of a store operation hits in the DTLB, the MPC750 checks the state of the changed bit. If the bit is not already set, the hardware automatically updates the DTLB and the translation table in memory to set the changed bit. For more information, see Section 5.4.1, "Page History Recording."
5.1.6
General Flow of MMU Address Translation
The following sections describe the general flow used by processors that implement the PowerPC architecture to translate effective addresses to virtual and then physical addresses.
5.1.6.1
Real Addressing Mode and Block Address Translation Selection
When an instruction or data access is generated and the corresponding instruction or data translation is disabled (MSR[IR] = 0 or MSR[DR] = 0), real addressing mode is used (physical address equals effective address) and the access continues to the memory subsystem as described in Section 5.2, "Real Addressing Mode." Figure 5-5 shows the flow the MMUs use in determining whether to select real addressing mode, block address translation, or the segment descriptor to select page address translation. Note that if the BAT array search results in a hit, the access is qualified with the appropriate protection bits. If the access violates the protection mechanism, an exception (ISI or DSI exception) is generated.
MOTOROLA
Chapter 5. Memory Management
5-11
MMU Overview
Effective Address Generated
I-Access Instruction Translation Disabled (MSR[IR] = 0) Perform Real Addressing Mode Translation Instruction Translation Enabled (MSR[IR] = 1)
D-Access Data Translation Enabled (MSR[DR] = 1) Data Translation Disabled (MSR[DR] = 0) Perform Real Addressing Mode Translation
Compare Address with Instruction or Data BAT Array (As Appropriate)
BAT Array Miss
BAT Array Hit
(See The Programming Environments Manual)
Perform Address Translation with Segment Descriptor (See Figure 5-6)
Access Protected
Access Permitted Translate Address
Access Faulted Continue Access to Memory Subsystem
Figure 5-5. General Flow of Address Translation (Real Addressing Mode and Block)
5.1.6.2
Page Address Translation Selection
If address translation is enabled and the effective address information does not match a BAT array entry, the segment descriptor must be located. When the segment descriptor is located, the T bit in the segment descriptor selects whether the translation is to a page or to a direct-store segment as shown in Figure 5-6. For 32-bit implementations, the segment descriptor for an access is contained in one of 16 on-chip segment registers; effective address bits EA[0-3] select one of the 16 segment registers. Note that the MPC750 does not implement the direct-store interface, and accesses to these segments cause a DSI or ISI exception. In addition, Figure 5-6 also shows the way in which the no-execute protection is enforced; if the N bit in the segment descriptor is set and the access is an instruction fetch, the access is faulted as described in Chapter 7, "Memory Management," in the Programming Environments Manual. Note that the figure shows the flow for these cases as described by the PowerPC OEA, and so the TLB references are shown as optional. Because the MPC750 implements TLBs, these branches are valid and are described in more detail throughout this chapter.
5-12 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MMU Overview
Address Translation with Segment Descriptor
Use EA[0-3] to Select One of 16 On-Chip Segment Registers
Check T-Bit in Segment Descriptor Direct-Store Segment Address (T = 1)*
Page Address Translation (T = 0)
DSI/ISI Exception Otherwise Generate 52-Bit Virtual Address from Segment Descriptor Compare Virtual Address with TLB Entries TLB Miss TLB Hit Perform Page Table Search Operation (See Figure 5-9) Access Permitted Access Protected (See Figure 5-8) I-Fetch with N-Bit Set in Segment Descriptor (No-Execute)
PTE Not Found Access Faulted
PTE Found
Translate Address
Access Faulted
Load TLB Entry
Continue Access to Memory Subsystem
Optional to the PowerPC architecture. Implemented in the MPC750.
* In the case of instruction accesses, causes ISI exception
Figure 5-6. General Flow of Page and Direct-Store Interface Address Translation
If SR[T] = 0, page address translation is selected. The information in the segment descriptor is then used to generate the 52-bit virtual address. The virtual address is then used to identify the page address translation information (stored as page table entries (PTEs) in a
MOTOROLA
Chapter 5. Memory Management
5-13
MMU Overview
page table in memory). For increased performance, the MPC750 has two on-chip TLBs to cache recently-used translations on-chip. If an access hits in the appropriate TLB, page translation succeeds and the physical address bits are forwarded to the memory subsystem. If the required translation is not resident, the MMU performs a search of the page table. If the required PTE is found, a TLB entry is allocated and the page translation is attempted again. This time, the TLB is guaranteed to hit. When the translation is located, the access is qualified with the appropriate protection bits. If the access causes a protection violation, either an ISI or DSI exception is generated. If the PTE is not found by the table search operation, a page fault condition exists, and an ISI or DSI exception occurs so software can handle the page fault.
5.1.7
MMU Exceptions Summary
To complete any memory access, the effective address must be translated to a physical address. As specified by the architecture, an MMU exception condition occurs if this translation fails for one of the following reasons: * * Page fault--there is no valid entry in the page table for the page specified by the effective address (and segment descriptor) and there is no valid BAT translation. An address translation is found but the access is not allowed by the memory protection mechanism.
The translation exception conditions defined by the OEA for 32-bit implementations cause either the ISI or the DSI exception to be taken as shown in Table 5-3. The state saved by the processor for each of these exceptions contains information that identifies the address of the failing instruction. Refer to Chapter 4, "Exceptions," for a more detailed description of exception processing.
Table 5-3. Translation Exception Conditions
Condition Page fault (no PTE found) Description No matching PTE found in page tables (and no matching BAT array entry) Exception I access: ISI exception SRR1[1] = 1 D access: DSI exception DSISR[1] =1 Block protection violation Conditions described for block in "Block Memory I access: ISI exception SRR1[4] = 1 Protection" in Chapter 7, "Memory Management," in the Programming Environments D access: DSI exception Manual." DSISR[4] =1 Conditions described for page in "Page Memory I access: ISI exception SRR1[4] = 1 Protection" in Chapter 7, "Memory Management," in the Programming Environments D access: DSI exception Manual. DSISR[4] =1
Page protection violation
5-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MMU Overview
Table 5-3. Translation Exception Conditions (continued)
Condition No-execute protection violation Description Attempt to fetch instruction when SR[N] = 1 Exception ISI exception SRR1[3] = 1 ISI exception SRR1[3] =1 DSI exception DSISR[5] =1 ISI exception SRR1[3] =1
Instruction fetch from direct-store Attempt to fetch instruction when SR[T] = 1 segment Data access to direct-store Attempt to perform load or store (including FP segment (including floating-point load or store) when SR[T] = 1 accesses) Instruction fetch from guarded memory Attempt to fetch instruction when MSR[IR] = 1 and either matching xBAT[G] = 1, or no matching BAT entry and PTE[G] = 1
In addition to the translation exceptions, there are other MMU-related conditions (some of them defined as implementation-specific, and therefore not required by the architecture) that can cause an exception to occur. These exception conditions map to processor exceptions as shown in Table 5-4. The only MMU exception conditions that occur when MSR[DR] = 0 are those that cause an alignment exception for data accesses. For more detailed information about the conditions that cause an alignment exception (in particular for string/multiple instructions), see Section 4.5.6, "Alignment Exception (0x00600)." Note that some exception conditions depend upon whether the memory area is set up as write-though (W = 1) or cache-inhibited (I = 1). These bits are described fully in "Memory/Cache Access Attributes," in Chapter 5, "Cache Model and Memory Coherency," of the Programming Environments Manual. Refer to Chapter 4, "Exceptions," and to Chapter 6, "Exceptions," in the Programming Environments Manual for a complete description of the SRR1 and DSISR bit settings for these exceptions. The LSU initiates out-of-order accesses without knowledge of whether it is legal to do so. However, the MMU does not perform a hardware table search due to TLB misses until the request is required by the program flow. In these out-of-order cases, the MMU detects protection violations and whether a dcbz instruction specifies a page marked as write-through or cache-inhibited. The MMU also detects alignment exceptions caused by the dcbz instruction and prevents the changed bit in the PTE from being updated erroneously in these cases.
Table 5-4. Other MMU Exception Conditions for the MPC750 Processor
Condition dcbz with W = 1 or I = 1 Description dcbz instruction to write-through or cache-inhibited segment or block Reservation instruction or external control instruction when SR[T] =1 Exception Alignment exception (not required by architecture for this condition) DSI exception DSISR[5] =1
lwarx, stwcx., eciwx, or ecowx instruction to direct-store segment
MOTOROLA
Chapter 5. Memory Management
5-15
MMU Overview
Table 5-4. Other MMU Exception Conditions for the MPC750 Processor
Condition Floating-point load or store to direct-store segment Load or store that results in a direct-store error eciwx or ecowx attempted when external control facility disabled lmw, stmw, lswi, lswx, stswi, or stswx instruction attempted in little-endian mode Operand misalignment Description FP memory access when SR[T] =1 Exception See data access to direct-store segment in Table 5-3. Does not apply DSI exception DSISR[11] = 1 Alignment exception
Does not occur in MPC750 eciwx or ecowx attempted with EAR[E] = 0 lmw, stmw, lswi, lswx, stswi, or stswx instruction attempted while MSR[LE] = 1 Translation enabled and a floating-point load/store, stmw, stwcx., lmw, lwarx, eciwx, or ecowx instruction operand is not word-aligned
Alignment exception (some of these cases are implementation-specific)
5.1.8
MMU Instructions and Register Summary
The MMU instructions and registers allow the operating system to set up the block address translation areas and the page tables in memory. Note that because the implementation of TLBs is optional, the instructions that refer to these structures are also optional. However, as these structures serve as caches of the page table, the architecture specifies a software protocol for maintaining coherency between these caches and the tables in memory whenever the tables in memory are modified. When the tables in memory are changed, the operating system purges these caches of the corresponding entries, allowing the translation caching mechanism to refetch from the tables when the corresponding entries are required. Note that the MPC750 implements all TLB-related instructions except tlbia, which is treated as an illegal instruction. Because the MMU specification for these processors is so flexible, it is recommended that the software that uses these instructions and registers be encapsulated into subroutines to minimize the impact of migrating across the family of implementations. Table 5-5 summarizes MPC750 instructions that specifically control the MMU. For more detailed information about the instructions, refer to Chapter 2, "Programming Model," in this book and Chapter 8, "Instruction Set," in the Programming Environments Manual
5-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MMU Overview
Table 5-5. MPC750 Microprocessor Instruction Summary--Control MMUs
Instruction mtsr SR,rS mtsrin rS,rB mfsr rD,SR mfsrin rD,rB tlbie rB* Move to Segment Register SR[SR#] rS Move to Segment Register Indirect SR[rB[0-3]]rS Move from Segment Register rDSR[SR#] Move from Segment Register Indirect rDSR[rB[0-3]] TLB Invalidate Entry For effective address specified by rB, TLB[V]0 The tlbie instruction invalidates all TLB entries indexed by the EA, and operates on both the instruction and data TLBs simultaneously invalidating four TLB entries. The index corresponds to bits 14-19 of the EA. Software must ensure that instruction fetches or memory references to the virtual pages specified by the tlbie instruction have been completed prior to executing the tlbie instruction. TLB Synchronize Synchronizes the execution of all other tlbie instructions in the system. In the MPC750, when the TLBISYNC signal is negated, instruction execution may continue or resume after the completion of a tlbsync instruction. When the TLBISYNC signal is asserted, instruction execution stops after the completion of a tlbsync instruction. See Section 8.8.2, "TLBISYNC Input" for more information. Description
tlbsync*
*These instructions are defined by the PowerPC architecture, but are optional.
Table 5-6 summarizes the registers that the operating system uses to program the MPC750 MMUs. These registers are accessible to supervisor-level software only. These registers are described in Chapter 2, "Programming Model."
Table 5-6. MPC750 Microprocessor MMU Registers
Register Segment registers (SR0-SR15) Description The sixteen 32-bit segment registers are present only in 32-bit implementations of the PowerPC architecture. The fields in the segment register are interpreted differently depending on the value of bit 0. The segment registers are accessed by the mtsr, mtsrin, mfsr, and mfsrin instructions. There are 16 BAT registers, organized as four pairs of instruction BAT registers (IBAT0U-IBAT3U paired with IBAT0L-IBAT3L) and four pairs of data BAT registers (DBAT0U-DBAT3U paired with DBAT0L-DBAT3L). The BAT registers are defined as 32-bit registers in 32-bit implementations. These are special-purpose registers that are accessed by the mtspr and mfspr instructions. The SDR1 register specifies the variables used in accessing the page tables in memory. SDR1 is defined as a 32-bit register for 32-bit implementations. This special-purpose register is accessed by the mtspr and mfspr instructions.
BAT registers (IBAT0U-IBAT3U, IBAT0L-IBAT3L, DBAT0U-DBAT3U, and DBAT0L-DBAT3L) SDR1
If an MMU register is being accessed by an instruction in the instruction stream, the IMMU stalls for one translation cycle to perform that operation. The sequencer serializes instructions to ensure the data correctness. For updating the IBATs and SRs, the sequencer classifies those operations as fetch serializing. After such an instruction is dispatched, the
MOTOROLA Chapter 5. Memory Management 5-17
Real Addressing Mode
instruction buffer is flushed and the fetch stalls until the instruction completes. However, for reading from the IBATs, the operation is classified as execution serializing. As long as the LSU ensures that all previous instructions can be executed, subsequent instructions can be fetched and dispatched.
5.2
Real Addressing Mode
If address translation is disabled (MSR[IR] = 0 or MSR[DR] = 0) for a particular access, the effective address is treated as the physical address and is passed directly to the memory subsystem as described in Chapter 7, "Memory Management," in the Programming Environments Manual. Note that the default WIMG bits (0b0011) cause data accesses to be considered cacheable (I = 0) and thus load and store accesses are weakly ordered. This is the case even if the data cache is disabled in the HID0 register (as it is out of hard reset). If I/O devices require load and store accesses to occur in strict program order (strongly ordered), translation must be enabled so that the corresponding I bit can be set. Note also, that the G bit must be set to ensure that the accesses are strongly ordered. For instruction accesses, the default memory access mode bits (WIMG) are also 0b0011. That is, instruction accesses are considered cacheable (I = 0), and the memory is guarded. Again, instruction accesses are considered cacheable even if the instruction cache is disabled in the HID0 register (as it is out of hard reset). The W and M bits have no effect on the instruction cache. For information on the synchronization requirements for changes to MSR[IR] and MSR[DR], refer to Section 2.3.2.4, "Synchronization," in this manual, and "Synchronization Requirements for Special Registers and for Lookaside Buffers" in Chapter 2, "PowerPC Register Set," in the Programming Environments Manual.
5.3
Block Address Translation
The block address translation (BAT) mechanism in the OEA provides a way to map ranges of effective addresses larger than a single page into contiguous areas of physical memory. Such areas can be used for data that is not subject to normal virtual memory handling (paging), such as a memory-mapped display buffer or an extremely large array of numerical data. Block address translation in the MPC750 is described in Chapter 7, "Memory Management," in the Programming Environments Manual for 32-bit implementations. Implementation Note-- The MPC750 BAT registers are not initialized by the hardware after the power-up or reset sequence. Consequently, all valid bits in both instruction and data BAT areas must be explicitly cleared before setting any BAT area for the first time and before enabling translation. Also, note that software must avoid overlapping blocks while updating a BAT area or areas. Even if translation is disabled, multiple BAT area hits (with
5-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Segment Model
the valid bits set) can corrupt the remaining portion (any bits except the valid bits) of the BAT registers. Thus, multiple BAT hits (with valid bits set) are considered a programming error whether translation is enabled or disabled, and can lead to unpredictable results if translation is enabled, (or if translation is disabled, when translation is eventually enabled). For the case of unused BATs (if translation is to be enabled) it is sufficient precaution to simply clear the valid bits of the unused BAT entries.
5.4
Memory Segment Model
The MPC750 adheres to the memory segment model as defined in Chapter 7, "Memory Management," in the Programming Environments Manual for 32-bit implementations. Memory in the PowerPC OEA is divided into 256-Mbyte segments. This segmented memory model provides a way to map 4-Kbyte pages of effective addresses to 4-Kbyte pages in physical memory (page address translation), while providing the programming flexibility afforded by a large virtual address space (52 bits). The segment/page address translation mechanism may be superseded by the block address translation (BAT) mechanism described in Section 5.3, "Block Address Translation." If not, the translation proceeds in the following two steps: 1. from effective address to the virtual address (which never exists as a specific entity but can be considered to be the concatenation of the virtual page number and the byte offset within a page), and 2. from virtual address to physical address. This section highlights those areas of the memory segment model defined by the OEA that are specific to the MPC750.
5.4.1
Page History Recording
Referenced (R) and changed (C) bits in each PTE keep history information about the page. They are maintained by a combination of the MPC750 table search hardware and the system software. The operating system uses this information to determine which areas of memory to write back to disk when new pages must be allocated in main memory. Referenced and changed recording is performed only for accesses made with page address translation and not for translations made with the BAT mechanism or for accesses that correspond to direct-store (T = 1) segments. Furthermore, R and C bits are maintained only for accesses made while address translation is enabled (MSR[IR] = 1 or MSR[DR] = 1). In the MPC750, the referenced and changed bits are updated as follows: * For TLB hits, the C bit is updated according to Table 5-7.
MOTOROLA
Chapter 5. Memory Management
5-19
Memory Segment Model
*
For TLB misses, when a table search operation is in progress to locate a PTE. The R and C bits are updated (set, if required) to reflect the status of the page based on this access.
Table 5-7. Table Search Operations to Update History Bits--TLB Hit Case
R and C bits in TLB Entry 00 01 10 11 Processor Action Combination doesn't occur Combination doesn't occur Read: No special action Write: The MPC750 initiates a table search operation to update C. No special action for read or write
The table shows that the status of the C bit in the TLB entry (in the case of a TLB hit) is what causes the processor to update the C bit in the PTE (the R bit is assumed to be set in the page tables if there is a TLB hit). Therefore, when software clears the R and C bits in the page tables in memory, it must invalidate the TLB entries associated with the pages whose referenced and changed bits were cleared. The dcbt and dcbtst instructions can execute if there is a TLB/BAT hit or if the processor is in real addressing mode. In case of a TLB or BAT miss, these instructions are treated as no-ops; they do not initiate a table search operation and they do not set either the R or C bits. As defined by the PowerPC architecture, the referenced and changed bits are updated as if address translation were disabled (real addressing mode). If these update accesses hit in the data cache, they are not seen on the external bus. If they miss in the data cache, they are performed as typical cache line fill accesses on the bus (assuming the data cache is enabled).
5.4.1.1
Referenced Bit
The referenced (R) bit of a page is located in the PTE in the page table. Every time a page is referenced (with a read or write access) and the R bit is zero, the MPC750 sets the R bit in the page table. The OEA specifies that the referenced bit may be set immediately, or the setting may be delayed until the memory access is determined to be successful. Because the reference to a page is what causes a PTE to be loaded into the TLB, the referenced bit in all MPC750 TLB entries is effectively always set. The processor never automatically clears the referenced bit. The referenced bit is only a hint to the operating system about the activity of a page. At times, the referenced bit may be set although the access was not logically required by the program or even if the access was prevented by memory protection. Examples of this include the following:
5-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Segment Model
* * * * *
Fetching of instructions not subsequently executed A memory reference caused by a speculatively executed instruction that is mispredicted Accesses generated by an lswx or stswx instruction with a zero length Accesses generated by an stwcx. instruction when no store is performed because a reservation does not exist Accesses that cause exceptions and are not completed
5.4.1.2
Changed Bit
The changed bit of a page is located both in the PTE in the page table and in the copy of the PTE loaded into the TLB (if a TLB is implemented, as in the MPC750). Whenever a data store instruction is executed successfully, if the TLB search (for page address translation) results in a hit, the changed bit in the matching TLB entry is checked. If it is already set, it is not updated. If the TLB changed bit is 0, the MPC750 initiates the table search operation to set the C bit in the corresponding PTE in the page table. The MPC750 then reloads the TLB (with the C bit set). The changed bit (in both the TLB and the PTE in the page tables) is set only when a store operation is allowed by the page memory protection mechanism and the store is guaranteed to be in the execution path (unless an exception, other than those caused by the sc, rfi, or trap instructions, occurs). Furthermore, the following conditions may cause the C bit to be set: * * The execution of an stwcx. instruction is allowed by the memory protection mechanism but a store operation is not performed. The execution of an stswx instruction is allowed by the memory protection mechanism but a store operation is not performed because the specified length is zero. The store operation is not performed because an exception occurs before the store is performed.
*
Again, note that although the execution of the dcbt and dcbtst instructions may cause the R bit to be set, they never cause the C bit to be set.
5.4.1.3
Scenarios for Referenced and Changed Bit Recording
This section provides a summary of the model (defined by the OEA) that is used by the processors for maintaining the referenced and changed bits. In some scenarios, the bits are guaranteed to be set by the processor, in some scenarios, the architecture allows that the bits may be set (not absolutely required), and in some scenarios, the bits are guaranteed to not be set. Note that when the MPC750 updates the R and C bits in memory, the accesses are performed as if MSR[DR] = 0 and G = 0 (that is, as nonguarded cacheable operations in which coherency is required).
MOTOROLA Chapter 5. Memory Management 5-21
Memory Segment Model
Table 5-8 defines a prioritized list of the R and C bit settings for all scenarios. The entries in the table are prioritized from top to bottom, such that a matching scenario occurring closer to the top of the table takes precedence over a matching scenario closer to the bottom of the table. For example, if an stwcx. instruction causes a protection violation and there is no reservation, the C bit is not altered, as shown for the protection violation case. Note that in the table, load operations include those generated by load instructions, by the eciwx instruction, and by the cache management instructions that are treated as a load with respect to address translation. Similarly, store operations include those operations generated by store instructions, by the ecowx instruction, and by the cache management instructions that are treated as a store with respect to address translation.
Table 5-8. Model for Guaranteed R and C Bit Settings
Causes Setting of R Bit Priority Scenario OEA 1 2 3 4 No-execute protection violation Page protection violation Out-of-order instruction fetch or load operation Out-of-order store operation. Would be required by the sequential execution model in the absence of system-caused or imprecise exceptions, or of floating-point assist exception for instructions that would cause no other kind of precise exception. All other out-of-order store operations Zero-length load (lswx) Zero-length store (stswx) Store conditional (stwcx.) that does not store In-order instruction fetch Load instruction or eciwx Store instruction, ecowx or dcbz instruction icbi, dcbt, or dcbtst instruction dcbst or dcbf instruction dcbi instruction No Maybe Maybe Maybe1 MPC750 No Yes No No OEA No No No No MPC750 No No No No Causes Setting of C Bit
5 6 7 8 9 10 11 12 13 14
Maybe1 Maybe Maybe1 Maybe1 Yes2 Yes Yes Maybe Maybe Maybe1
No No No Yes Yes Yes Yes No Yes Yes
Maybe1 No Maybe1 Maybe1 No No Yes No No Maybe1
No No No Yes No No Yes No No Yes
Notes: 1 If C is set, R is guaranteed to be set also. 2 Includes the case in which the instruction is fetched out of order and R is not set (does not apply for MPC750).
For more information, see "Page History Recording" in Chapter 7, "Memory Management," of the Programming Environments Manual.
5-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Segment Model
5.4.2
Page Memory Protection
The MPC750 implements page memory protection as it is defined in Chapter 7, "Memory Management," in the Programming Environments Manual.
5.4.3
TLB Description
The MPC750 implements separate 128-entry data and instruction TLBs to maximize performance. This section describes the hardware resources provided in the MPC750 to facilitate page address translation. Note that the hardware implementation of the MMU is not specified by the architecture, and while this description applies to the MPC750, it does not necessarily apply to other processors of this family.
5.4.3.1
TLB Organization
Because the MPC750 has two MMUs (IMMU and DMMU) that operate in parallel, some of the MMU resources are shared, and some are actually duplicated (shadowed) in each MMU to maximize performance. For example, although the architecture defines a single set of segment registers for the MMU, the MPC750 maintains two identical sets of segment registers, one for the IMMU and one for the DMMU; when an instruction that updates the segment register executes, the MPC750 automatically updates both sets. Each TLB contains 128 entries organized as a two-way set-associative array with 64 sets as shown in Figure 5-7 for the DTLB (the ITLB organization is the same). When an address is being translated, a set of two TLB entries is indexed in parallel with the access to a segment register. If the address in one of the two TLB entries is valid and matches the 40-bit virtual page number, that TLB entry contains the translation. If no match is found, a TLB miss occurs. Unless the access is the result of an out-of-order access, a hardware table search operation begins if there is a TLB miss. If the access is out of order, the table search operation is postponed until the access is required, at which point the access is no longer out of order. When the matching PTE is found in memory, it is loaded into the TLB entry selected by the least-recently-used (LRU) replacement algorithm, and the translation process begins again, this time with a TLB hit.
MOTOROLA
Chapter 5. Memory Management
5-23
Memory Segment Model
EA[0-31] 0 0T EA[0-3] VSID Segment Registers 78
31
15 T
VSID EA[4-13] DTLB
V 0V Line 1 Compare
Line 0 EA[14-19] Select
Compare
63
Line1/Line 0 Hit RPN
MUX
PA[0-19]
Figure 5-7. Segment Register and DTLB Organization
The TLB entries are on-chip copies of PTEs in the page tables in memory and are similar in structure. To uniquely identify a TLB entry as the required PTE, the TLB entry also contains four more bits of the page index, EA[10-13] (in addition to the API bits in the PTE). Software cannot access the TLB arrays directly, except to invalidate an entry with the tlbie instruction. Each set of TLB entries has one associated LRU bit. The LRU bit for a set is updated any time either entry is used, even if the access is speculative. Invalid entries are always the first to be replaced. Although both MMUs can be accessed simultaneously (both sets of segment registers and TLBs can be accessed in the same clock), only one exception condition can be reported at a time. ITLB miss exception conditions are reported when there are no more instructions to
5-24 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Memory Segment Model
be dispatched or retired (the pipeline is empty). Refer to Chapter 6, "Instruction Timing," for more detailed information about the internal pipelines and the reporting of exceptions. When an instruction or data access occurs, the effective address is routed to the appropriate MMU. EA[0--3] select one of the 16 segment registers and the remaining effective address bits and the VSID field from the segment register is passed to the TLB. EA[14-19] then select two entries in the TLB; the valid bits are checked and the 40-bit virtual page number (24-bit VSID and EA[4-19]) must match the VSID, EAPI, and API fields of the TLB entries. If one of the entries hits, the PP bits are checked for a protection violation. If these bits don't cause an exception, the C bit is checked and a table search operation is initiated if C must be updated. If C does not require updating, the RPN value is passed to the memory subsystem and the WIMG bits are then used as attributes for the access. Although address translation is disabled on a reset condition, the valid bits of TLB entries are not automatically cleared. Thus, TLB entries must be explicitly cleared by the system software (with the tlbie instruction) before address translation is enabled. Also, note that the segment registers do not have a valid bit, and so they should also be initialized before translation is enabled.
5.4.3.2
TLB Invalidation
The MPC750 implements the optional tlbie and tlbsync instructions, which are used to invalidate TLB entries. The execution of the tlbie instruction always invalidates four entries--both the ITLB and DTLB entries indexed by EA[14-19]. The architecture allows tlbie to optionally enable a TLB invalidate signaling mechanism in hardware so that other processors also invalidate their resident copies of the matching PTE. The MPC750 does not signal the TLB invalidation to other processors nor does it perform any action when a TLB invalidation is performed by another processor. The tlbsync instruction causes instruction execution to stop if the TLBISYNC signal is asserted. If TLBISYNC is negated, instruction execution may continue or resume after the completion of a tlbsync instruction. Section 8.8.2, "TLBISYNC Input," describes the TLB synchronization mechanism in further detail. The tlbia instruction is not implemented on the MPC750 and when its opcode is encountered, an illegal instruction program exception is generated. To invalidate all entries of both TLBs, 64 tlbie instructions must be executed, incrementing the value in EA14-EA19 by one each time. See Chapter 8, "Instruction Set," in the Programming Environments Manual for detailed information about the tlbie instruction. Software must ensure that instruction fetches or memory references to the virtual pages specified by the tlbie have been completed prior to executing the tlbie instruction. Other than the possible TLB miss on the next instruction prefetch, the tlbie instruction does not affect the instruction fetch operation--that is, the prefetch buffer is not purged and does not cause these instructions to be refetched.
MOTOROLA Chapter 5. Memory Management 5-25
Memory Segment Model
5.4.4
Page Address Translation Summary
Figure 5-8 provides the detailed flow for the page address translation mechanism. The figure includes the checking of the N bit in the segment descriptor and then expands on the `TLB Hit' branch of Figure 5-6. The detailed flow for the `TLB Miss' branch of Figure 5-6 is described in Section 5.4.5, "Page Table Search Operation." Note that as in the case of block address translation, if an attempt is made to execute a dcbz instruction to a page marked either write-through or caching-inhibited (W = 1 or I = 1), an alignment exception is generated. The checking of memory protection violation conditions is described in Chapter 7, "Memory Management," in the Programming Environments Manual.
5-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Segment Model
Effective Address Generated (See Figure 5-6) Otherwise Instruction fetch with N-bit set in segment descriptor (No-execute)
Page address translation Generate 52-Bit Virtual Address from Segment Descriptor Compare Virtual Address with TLB Entries TLB hit case
dcbz instruction with W or I = 1
Otherwise
Alignment Exception Check Page Memory Protection Violation Conditions (See the Programming Environments Manual)
Access permitted
Access prohibited
(See the Programming Environments Manual)
Store access with PTE [C] = 0
Otherwise
Page Memory Protection Violation
Page Table Search Operation (See Figure 5-9)
PA[0-31]RPN||A[20-31]
Continue Access to Memory Subsystem with WIMG Bits from PTE
Figure 5-8. Page Address Translation Flow--TLB Hit
5.4.5
Page Table Search Operation
If the translation is not found in the TLBs (a TLB miss), the MPC750 initiates a table search operation which is described in this section. Formats for the PTE are given in "PTE Format
MOTOROLA Chapter 5. Memory Management 5-27
Memory Segment Model
for 32-Bit Implementations," in Chapter 7, "Memory Management," of the Programming Environments Manual. The following is a summary of the page table search process performed by the MPC750: 1. The 32-bit physical address of the primary PTEG is generated as described in "Page Table Addresses" in Chapter 7, "Memory Management," of the Programming Environments Manual. 2. The first PTE (PTE0) in the primary PTEG is read from memory. PTE reads occur with an implied WIM memory/cache mode control bit setting of 0b001. Therefore, they are considered cacheable and read (burst) from memory and placed in the cache. 3. The PTE in the selected PTEG is tested for a match with the virtual page number (VPN) of the access. The VPN is the VSID concatenated with the page index field of the virtual address. For a match to occur, the following must be true: -- PTE[H] = 0 -- PTE[V] = 1 -- PTE[VSID] = VA[0-23] -- PTE[API] = VA[24-29] 4. If a match is not found, step 3 is repeated for each of the other seven PTEs in the primary PTEG. If a match is found, the table search process continues as described in step 8. If a match is not found within the 8 PTEs of the primary PTEG, the address of the secondary PTEG is generated. 5. The first PTE (PTE0) in the secondary PTEG is read from memory. Again, because PTE reads have a WIM bit combination of 0b001, an entire cache line is read into the on-chip cache. 6. The PTE in the selected secondary PTEG is tested for a match with the virtual page number (VPN) of the access. For a match to occur, the following must be true: -- PTE[H] = 1 -- PTE[V] = 1 -- PTE[VSID] = VA[0-23] -- PTE[API] = VA[24-29] 7. If a match is not found, step 6 is repeated for each of the other seven PTEs in the secondary PTEG. If it is never found, an exception is taken (step 9). 8. If a match is found, the PTE is written into the on-chip TLB and the R bit is updated in the PTE in memory (if necessary). If there is no memory protection violation, the C bit is also updated in memory (if the access is a write operation) and the table search is complete.
5-28
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Segment Model
9. If a match is not found within the 8 PTEs of the secondary PTEG, the search fails, and a page fault exception condition occurs (either an ISI exception or a DSI exception). Figure 5-9 and Figure 5-10 show how the conceptual model for the primary and secondary page table search operations, described in the Programming Environments Manual, are realized in the MPC750. Figure 5-9 shows the case of a dcbz instruction that is executed with W = 1 or I = 1, and that the R bit may be updated in memory (if required) before the operation is performed or the alignment exception occurs. The R bit may also be updated if memory protection is violated.
MOTOROLA
Chapter 5. Memory Management
5-29
Memory Segment Model
Primary Page Table Search Generate PA Using Primary Hash Function PA Base PA of PTEG Fetch PTE from PTEG PA PA+ 8 (Fetch Next PTE in PTEG) Fetch PTE (64-Bits) from PA
Otherwise Otherwise Last PTE in PTEG Perform Secondary Page Table Search
PTE [VSID, API, H, V] = Segment Descriptor [VSID], EA[API], 0, 1 Secondary Page Table Search Hit (From Figure 5-10)
PTE[R] = 1
PTE[R] = 0
PTE[R] 1 R_Flag 1 Write PTE into TLB
Otherwise
dcbz Instruction with W or I = 1 Otherwise
Check Memory Protection Violation Conditions
R_Flag = 1
Access Permitted Access Prohibited Otherwise Otherwise R_Flag = 1 PTE[R] 1 (Update PTE[R] in Memory) Store Operation with PTE[C] = 0 TLB[PTE[C]] 1 PTE[C] 1 (Update PTE[C] in Memory) Also Update PTE[R] in Memory if R_Flag = 1
PTE[R] 1 (Update PTE[R] in Memory)
Otherwise R_Flag = 1
Alignment Exception
PTE[R] 1 (Update PTE[R] in Memory)
Page Table Search Complete
Page Table Search Complete
Memory Protection Violation
Figure 5-9. Primary Page Table Search
5-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Memory Segment Model
Secondary Page Table Search Generate PA Using Primary Hash Function PA Base PA of PTEG Fetch PTE from PTEG PA PA+ 8 (Fetch Next PTE in PTEG) Fetch PTE (64-Bits) from PA
Otherwise Otherwise Last PTE in PTEG
PTE [VSID, API, H, V] = Segment descriptor [VSID], EA[API], 1, 1
Secondary Page Table Search Hit (See Figure 5-9)
Page fault
Instruction access
Data access
Set SRR1[1] = 1
Set DSISR[1] = 1
ISI Exception
DSI Exception
Figure 5-10. Secondary Page Table Search Flow
5.4.6
Page Table Updates
When TLBs are implemented (as in the MPC750) they are defined as noncoherent caches of the page tables. TLB entries must be flushed explicitly with the TLB invalidate entry instruction (tlbie) whenever the corresponding PTE is modified. As the MPC750 is intended primarily for uniprocessor environments, it does not provide coherency of TLBs between multiple processors. If the MPC750 is used in a multiprocessor environment where TLB coherency is required, all synchronization must be implemented in software. Processors may write referenced and changed bits with unsynchronized, atomic byte store operations. Note that the V, R, and C bits each reside in a distinct byte of a PTE. Therefore, extreme care must be taken to use byte writes when updating only one of these bits. Explicitly altering certain MSR bits (using the mtmsr instruction), or explicitly altering PTEs, or certain system registers, may have the side effect of changing the effective or physical addresses from which the current instruction stream is being fetched. This kind of side effect is defined as an implicit branch. Implicit branches are not supported and an attempt to perform one causes boundedly-undefined results. Therefore, PTEs must not be
MOTOROLA Chapter 5. Memory Management 5-31
Memory Segment Model
changed in a manner that causes an implicit branch. Chapter 2, "PowerPC Register Set," in the Programming Environments Manual, lists the possible implicit branch conditions that can occur when system registers and MSR bits are changed.
5.4.7
Segment Register Updates
Synchronization requirements for using the move to segment register instructions are described in "Synchronization Requirements for Special Registers and for Lookaside Buffers" in Chapter 2, "PowerPC Register Set," in the Programming Environments Manual.
5-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 6 Instruction Timing
This chapter describes how the MPC750 microprocessor fetches, dispatches, and executes instructions and how it reports the results of instruction execution. It gives detailed descriptions of how the MPC750 execution units work, and how those units interact with other parts of the processor, such as the instruction fetching mechanism, register files, and caches. It gives examples of instruction sequences, showing potential bottlenecks and how to minimize their effects. Finally, it includes tables that identify the unit that executes each instruction implemented on the MPC750, the latency for each instruction, and other information that is useful for the assembly language programmer. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor."
6.1
Terminology and Conventions
This section provides an alphabetical glossary of terms used in this chapter. These definitions are provided as a review of commonly used terms and as a way to point out specific ways these terms are used in this chapter. * Branch prediction--The process of guessing whether a branch will be taken. Such predictions can be correct or incorrect; the term `predicted' as it is used here does not imply that the prediction is correct (successful). The PowerPC architecture defines a means for static branch prediction as part of the instruction encoding. Branch resolution--The determination of whether a branch is taken or not taken. A branch is said to be resolved when the processor can determine which instruction path to take. If the branch is resolved as predicted, the instructions following the predicted branch that may have been speculatively executed can complete (see completion). If the branch is not resolved as predicted, instructions on the mispredicted path, and any results of speculative execution, are purged from the pipeline and fetching continues from the nonpredicted path. Completion--Completion occurs when an instruction has finished executing, written back any results, and is removed from the completion queue. When an instruction completes, it is guaranteed that this instruction and all previous instructions can cause no exceptions.
*
*
MOTOROLA
Chapter 6. Instruction Timing
6-1
Terminology and Conventions
*
* *
* * *
*
* *
* *
Fall-through (branch fall-through)--A not-taken branch. On the MPC750, fall-through branch instructions are removed from the instruction stream at dispatch. That is, these instructions are allowed to fall through the instruction queue via the dispatch mechanism, without either being passed to an execution unit and or given a position in the completion queue. Fetch--The process of bringing instructions from memory (such as a cache or system memory) into the instruction queue. Folding (branch folding)--The replacement with target instructions of a branch instruction and any instructions along the not-taken path when a branch is either taken or predicted as taken. Finish--Finishing occurs in the last cycle of execution. In this cycle, the completion queue entry is updated to indicate that the instruction has finished executing. Latency-- The number of clock cycles necessary to execute an instruction and make ready the results of that execution for a subsequent instruction. Pipeline--In the context of instruction timing, the term `pipeline' refers to the interconnection of the stages. The events necessary to process an instruction are broken into several cycle-length tasks to allow work to be performed on several instructions simultaneously--analogous to an assembly line. As an instruction is processed, it passes from one stage to the next. When it does, the stage becomes available for the next instruction. Although an individual instruction may take many cycles to complete (the number of cycles is called instruction latency), pipelining makes it possible to overlap the processing so that the throughput (number of instructions completed per cycle) is greater than if pipelining were not implemented. Program order--The order of instructions in an executing program. More specifically, this term is used to refer to the original order in which program instructions are fetched into the instruction queue from the cache. Rename register--Temporary buffers used by instructions that have finished execution but have not completed. Reservation station--A buffer between the dispatch and execute stages that allows instructions to be dispatched even though the results of instructions on which the dispatched instruction may depend are not available. Retirement--Removal of the completed instruction from the completion queue. Stage--The term `stage' is used in two different senses, depending on whether the pipeline is being discussed as a physical entity or a sequence of events. In the latter case, a stage is an element in the pipeline during which certain actions are performed, such as decoding the instruction, performing an arithmetic operation, or writing back the results. A stage is typically described as taking a processor clock cycle to perform its operation; however, some events (such as dispatch and
6-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Timing Overview
* *
*
*
write-back) happen instantaneously, and may be thought to occur at the end of the stage. An instruction can spend multiple cycles in one stage. An integer multiply, for example, takes multiple cycles in the execute stage. When this occurs, subsequent instructions may stall. In some cases, an instruction may also occupy more than one stage simultaneously, especially in the sense that a stage can be seen as a physical resource--for example, when instructions are dispatched they are assigned a place in the completion queue at the same time they are passed to the execute stage. They can be said to occupy both the complete and execute stages in the same clock cycle. Stall--An occurrence when an instruction cannot proceed to the next stage. Superscalar--A superscalar processor is one that can issue multiple instructions concurrently from a conventional linear instruction stream. In a superscalar implementation, multiple instructions can be in the execute stage at the same time. Throughput--A measure of the number of instructions that are processed per cycle. For example, a series of double-precision floating-point multiply instructions has a throughput of one instruction per clock cycle. Write-back--Write-back (in the context of instruction handling) occurs when a result is written into the architectural registers (typically the GPRs and FPRs). Results are written back at completion time. Results in the write-back buffer cannot be flushed. If an exception occurs, these buffers must write back before the exception is taken.
6.2
Instruction Timing Overview
The MPC750 design minimizes average instruction execution latency, the number of clock cycles it takes to fetch, decode, dispatch, and execute instructions and make the results available for a subsequent instruction. Some instructions, such as loads and stores, access memory and require additional clock cycles between the execute phase and the write-back phase. These latencies vary depending on whether the access is to cacheable or noncacheable memory, whether it hits in the L1 or L2 cache, whether the cache access generates a write-back to memory, whether the access causes a snoop hit from another device that generates additional activity, and other conditions that affect memory accesses. The MPC750 implements many features to improve throughput, such as pipelining, superscalar instruction issue, branch folding, removal of fall-through branches, two-level speculative branch handling, and multiple execution units that operate independently and in parallel. As an instruction passes from stage to stage in a pipelined system, the following instruction can follow through the stages as the former instruction vacates them, allowing several instructions to be processed simultaneously. While it may take several cycles for an
MOTOROLA
Chapter 6. Instruction Timing
6-3
Instruction Timing Overview
instruction to pass through all the stages, when the pipeline has been filled, one instruction can complete its work on every clock cycle. Figure 6-1 represents a generic pipelined execution unit.
Stage 1 Clock 0 Instruction A Stage 2 -- Stage 3 --
Clock 1
Instruction B
Instruction A
--
Clock 2
Instruction C
Instruction B
Instruction A
Clock 3
Instruction D
Instruction C
Instruction B
Figure 6-1. Pipelined Execution Unit
The entire path that instructions take through the fetch, decode/dispatch, execute, complete, and write-back stages is considered the MPC750's master pipeline, and two of the MPC750's execution units (the FPU and LSU) are also multiple-stage pipelines. The MPC750 contains the following execution units that operate independently and in parallel: * * * * * * Branch processing unit (BPU) Integer unit 1 (IU1)--executes all integer instructions Integer unit 2 (IU2)--executes all integer instructions except multiplies and divides 64-bit floating-point unit (FPU) Load/store unit (LSU) System register unit (SRU)
The MPC750 can retire two instructions on every clock cycle. In general, the MPC750 processes instructions in four stages--fetch, decode/dispatch, execute, and complete as shown in Figure 6-2. Note that the example of a pipelined execution unit in Figure 6-1 is similar to the three-stage FPU pipeline in Figure 6-2.
6-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Timing Overview
Maximum four-instruction fetch per clock cycle
Fetch BPU Decode/Dispatch
Maximum three-instruction dispatch per clock cycle (includes one branch instruction) Execute Stage
FPU1 FPU2 SRU FPU3 IU1 IU2 LSU1 LSU2
Complete (Write-back)
Maximum two -instruction completion per clock cycle
Figure 6-2. Superscalar/Pipeline Diagram
The instruction pipeline stages are described as follows: * The instruction fetch stage includes the clock cycles necessary to request instructions from the memory system and the time the memory system takes to respond to the request. Instruction fetch timing depends on many variables, such as whether the instruction is in the branch target instruction cache, the on-chip instruction cache, or the L2 cache. Those factors increase when it is necessary to fetch instructions from system memory, and include the processor-to-bus clock ratio, the amount of bus traffic, and whether any cache coherency operations are required. Because there are so many variables, unless otherwise specified, the instruction timing examples below assume optimal performance, that the instructions are available in the instruction queue in the same clock cycle that they are requested. The fetch stage ends when the instruction is dispatched. The decode/dispatch stage consists of the time it takes to fully decode the instruction and dispatch it from the instruction queue to the appropriate execution unit. Instruction dispatch requires the following: -- Instructions can be dispatched only from the two lowest instruction queue entries, IQ0 and IQ1. -- A maximum of two instructions can be dispatched per clock cycle (although an additional branch instruction can be handled by the BPU). -- Only one instruction can be dispatched to each execution unit per clock cycle. -- There must be a vacancy in the specified execution unit.
Chapter 6. Instruction Timing 6-5
*
MOTOROLA
Instruction Timing Overview
*
*
-- A rename register must be available for each destination operand specified by the instruction. -- For an instruction to dispatch, the appropriate execution unit must be available and there must be an open position in the completion queue. If no entry is available, the instruction remains in the IQ. The execute stage consists of the time between dispatch to the execution unit (or reservation station) and the point at which the instruction vacates the execution unit. Most integer instructions have a one-cycle latency; results of these instructions can be used in the clock cycle after an instruction enters the execution unit. However, integer multiply and divide instructions take multiple clock cycles to complete. The IU1 can process all integer instructions; the IU2 can process all integer instructions except multiply and divide instructions. The LSU and FPU are pipelined (as shown in Figure 6-2). The complete (complete/write-back) pipeline stage maintains the correct architectural machine state and commits it to the architectural registers at the proper time. If the completion logic detects an instruction containing an exception status, all following instructions are cancelled, their execution results in rename registers are discarded, and the correct instruction stream is fetched. The complete stage ends when the instruction is retired. Two instructions can be retired per cycle. Instructions are retired only from the two lowest completion queue entries, CQ0 and CQ1.
The notation conventions used in the instruction timing examples are as follows: Fetch--The fetch stage includes the time between when an instruction is requested and when it is brought into the instruction queue. This latency can be very variable, depending upon whether the instruction is in the BTIC, the on-chip cache, the L2 cache, or system memory (in which case latency can be affected by bus speed and traffic on the system bus, and address translation issues). Therefore, in the examples in this chapters, the fetch stage is usually idealized, that is, an instruction is usually shown to be in the fetch stage when it is a valid instruction in the instruction queue. The instruction queue has six entries, IQ0-IQ5. In dispatch entry (IQ0/IQ1)--Instructions can be dispatched from IQ0 and IQ1. Because dispatch is instantaneous, it is perhaps more useful to describe it as an event that marks the point in time between the last cycle in the fetch stage and the first cycle in the execute stage. Execute--The operations specified by an instruction are being performed by the appropriate execution unit. The black stripe is a reminder that the instruction occupies an entry in the completion queue, described in Figure 6-3.
6-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Timing Considerations
Complete--The instruction is in the completion queue. In the final stage, the results of the executed instruction are written back and the instruction is retired. The completion queue has six entries, CQ0-CQ5. In retirement entry--Completed instructions can be retired from CQ0 and CQ1. Like dispatch, retirement is an event that in this case occurs at the end of the final cycle of the complete stage. Figure 6-3 shows the stages of MPC750 execution units.
IU1/IU2/SRU Instructions Fetch In Dispatch Entry Execute1 Complete/Retire
LSU Instructions Fetch In Dispatch Entry
Execute Cache EA Calculation Align Complete/Retire
FPU Instructions Fetch In Dispatch Entry Multiply
Execute Add Round/ Normalize Complete/Retire
BPU Instructions Fetch Fetch Predict In Dispatch Entry In Completion Complete/Retire2 Queue2
1 Several integer instructions, such as multiply and divide instructions, require multiple cycles in the execute stage. 2 Only those branch instructions that update the LR or CTR take an entry in the completion queue.
Figure 6-3. MPC750 Microprocessor Pipeline Stages
6.3
Timing Considerations
The MPC750 is a superscalar processor; as many as three instructions can be issued to the execution units (one branch instruction to the branch processing unit, and two instructions issued from the dispatch queue to the other execution units) during each clock cycle. Only one instruction can be dispatched to each execution unit. Although instructions appear to the programmer to execute in program order, the MPC750 improves performance by executing multiple instructions at a time, using hardware to manage dependencies. When an instruction is dispatched, the register file provides the source data to the execution unit. The register files and rename register have sufficient bandwidth to allow dispatch of two instructions per clock under most conditions.
MOTOROLA Chapter 6. Instruction Timing 6-7
Timing Considerations
The MPC750's BPU decodes and executes branches immediately after they are fetched. When a conditional branch cannot be resolved due to a CR data dependency, the branch direction is predicted and execution continues from the predicted path. If the prediction is incorrect, the following steps are taken: 1. The instruction queue is purged and fetching continues from the correct path. 2. Any instructions ahead of the predicted branch in the completion queue are allowed to complete. 3. Instructions after the mispredicted branch are purged. 4. Dispatching resumes from the correct path. After an execution unit finishes executing an instruction, it places resulting data into the appropriate GPR or FPR rename register. The results are then stored into the correct GPR or FPR during the write-back stage. If a subsequent instruction needs the result as a source operand, it is made available simultaneously to the appropriate execution unit, which allows a data-dependent instruction to be decoded and dispatched without waiting to read the data from the register file. Branch instructions that update either the LR or CTR write back their results in a similar fashion. The following section describes this process in greater detail.
6.3.1
General Instruction Flow
As many as four instructions can be fetched into the instruction queue (IQ) in a single clock cycle. Instructions enter the IQ and are issued to the various execution units from the dispatch queue. The MPC750 tries to keep the IQ full at all times, unless instruction cache throttling is operating. The number of instructions requested in a clock cycle is determined by the number of vacant spaces in the IQ during the previous clock cycle. This is shown in the examples in this chapter. Although the instruction queue can accept as many as four new instructions in a single clock cycle, if only one IQ entry is vacant, only one instruction is fetched. Typically instructions are fetched from the on-chip instruction cache, but they may also be fetched from the branch target instruction cache (BTIC). If the instruction request hits in the BTIC, it can usually present the first two instructions of the new instruction stream in the next clock cycle, giving enough time for the next pair of instructions to be fetched from the instruction cache with no idle cycles. If instructions are not in the BTIC or the on-chip instruction cache, they are fetched from the L2 cache or from system memory. The MPC750's instruction cache throttling feature, managed through the instruction cache throttling control (ICTC) register, can lower the processor's overall junction temperature by slowing the instruction fetch rate. See Chapter 10, "Power and Thermal Management." Branch instructions are identified by the fetcher, and forwarded to the BPU directly, bypassing the dispatch queue. If the branch is unconditional or if the specified conditions are already known, the branch can be resolved immediately. That is, the branch direction is
6-8 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Timing Considerations
known and instruction fetching can continue from the correct location. Otherwise, the branch direction must be predicted. The MPC750 offers several resources to aid in quick resolution of branch instructions and for improving the accuracy of branch predictions. These include the following: * Branch target instruction cache--The 64-entry (four-way-associative) branch target instruction cache (BTIC) holds branch target instructions so when a branch is encountered in a repeated loop, usually the first two instructions in the target stream can be fetched into the instruction queue on the next clock cycle. The BTIC can be disabled and invalidated through bits in HID0. Dynamic branch prediction--The 512-entry branch history table (BHT) is implemented with two bits per entry for four degrees of prediction--not-taken, strongly not-taken, taken, strongly taken. Whether a branch instruction is taken or not-taken can change the strength of the next prediction. This dynamic branch prediction is not defined by the PowerPC architecture. To reduce aliasing, only predicted branches update the BHT entries. Dynamic branch prediction is enabled by setting HID0[BHT]; otherwise, static branch prediction is used. Static branch prediction--Static branch prediction is defined by the PowerPC architecture and involves encoding the branch instructions. See Section 6.4.1.3.1, "Static Branch Prediction."
*
*
Branch instructions that do not update the LR or CTR are removed from the instruction stream either by branch folding or removal of fall-through branch instructions, as described in Section 6.4.1.1, "Branch Folding and Removal of Fall-Through Branch Instructions." Branch instructions that update the LR or CTR are treated as if they require dispatch (even through they are not issued to an execution unit in the process). They are assigned a position in the completion queue to ensure that the CTR and LR are updated sequentially. All other instructions are issued from the IQ0 and IQ1. The dispatch rate depends upon the availability of resources such as the execution units, rename registers, and completion queue entries, and upon the serializing behavior of some instructions. Instructions are dispatched in program order; an instruction in IQ1 cannot be dispatched ahead of one in IQ0. Figure 6-4 shows the paths taken by instructions.
MOTOROLA
Chapter 6. Instruction Timing
6-9
Timing Considerations
Fetch (Maximum four instructions per clock cycle)
IQ5
IQ4
IQ3
IQ2
IQ1
IQ0
Instruction Queue (In program order)
Branch Processing Unit
Dispatch (Maximum 2 instructions per clock cycle; 1 instruction per unit)
Completion Queue Assignment
Reservation Stations FPU
LSU
IU1
IU2
SRU
Store Queue
CQ5
CQ4
CQ3
CQ2 Complete (Retire)
CQ1
CQ0 Completion Queue (In program order)
Figure 6-4. Instruction Flow Diagram
6.3.2
Instruction Fetch Timing
Instruction fetch latency depends on whether the fetch hits the BTIC, the on-chip instruction cache, or the L2 cache, if one is implemented. If no cache hit occurs, a memory
6-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Timing Considerations
transaction is required in which case fetch latency is affected by bus traffic, bus clock speed, and memory translation. These issues are discussed further in the following sections.
6.3.2.1
Cache Arbitration
When the instruction fetcher requests instructions from the instruction cache, two things may happen. If the instruction cache is idle and the requested instructions are present, they are provided on the next clock cycle. However, if the instruction cache is busy due to a cache-line-reload operation, instructions cannot be fetched until that operation completes.
6.3.2.2
Cache Hit
If the instruction fetch hits the instruction cache, it takes only one clock cycle after the request for as many as four instructions to enter the instruction queue. Note that the cache is not blocked to internal accesses during a cache reload completes (hits under misses). The critical double word is written simultaneously to the cache and forwarded to the requesting unit, minimizing stalls due to load delays. Figure 6-5 shows a simple example of instruction fetching that hits in the on-chip cache. This example uses a series of integer add and double-precision floating-point add instructions to show how the number of instructions to be fetched is determined, how program order is maintained by the instruction and completion queues, how instructions are dispatched and retired in pairs (maximum), and how the FPU, IU1, and IU2 pipelines function. The following instruction sequence is examined:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 add fadd add fadd br 6 fsub fadd fadd add add add add fadd add fadd .
MOTOROLA
Chapter 6. Instruction Timing
6-11
Timing Considerations
.
0 *** 0 add 1 fadd 2 add 3 fadd
1
2
3
4
5
6
7
8
9
10
11
Fetch (in IQ) In dispatch entry (IQ0/IQ1) Execute Complete (In CQ) 4b 5 fsub 6 fadd 7 fadd 8 add 9 add 10 add 11 add 12 fadd 13 add 14 fadd In retirement entry (CQ0/CQ1)
Instruction Queue 3 2 1 0 5 4 3 2
7 6
11 10 9 8 7
12 11 10 9 8 7
12 11 10 9
14 13 12 11
(16) (15) 14 13
(18) (17) (16) (15) 14 13
(18) (17) (16) (15)
Completion Queue 3 2 1 0 6 3 2 1 6 3 2 1 8 7 6 3 10 9 8 7 6
1 0
12 11 10 9 8 7
12 11 10 9 8 7
14 13 12 11 10 9
14 13 12 11
14 13
Figure 6-5. Instruction Timing--Cache Hit
The instruction timing for this example is described cycle-by-cycle as follows: 0. In cycle 0, instructions 0-3 are fetched from the instruction cache. Instructions 0 and 1 are placed in the two entries in the instruction queue from which they can be dispatched on the next clock cycle.
6-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Timing Considerations
1. In cycle 1, instructions 0 and 1 are dispatched to the IU2 and FPU, respectively. Notice that for instructions to be dispatched they must be assigned positions in the completion queue. In this case, since the completion queue was empty, instructions 0 and 1 take the two lowest entries in the completion queue. Instructions 2 and 3 drop into the two dispatch positions in the instruction queue. Because there were two positions available in the instruction queue in clock cycle 0, two instructions (4 and 5) are fetched into the instruction queue. Instruction 4 is a branch unconditional instruction, which resolves immediately as taken. Because the branch is taken, it can therefore be folded from the instruction queue. 2. In cycle 2, assume a BTIC hit occurs and target instructions 6 and 7 are fetched into the instruction queue, replacing the folded b instruction (4) and instruction 5. Instruction 0 completes, writes back its results and vacates the completion queue by the end of the clock cycle. Instruction 1 enters the second FPU execute stage, instruction 2 is dispatched to the IU2, and instruction 3 is dispatched into the first FPU execute stage. Because the taken branch instruction (4) does not update either CTR or LR, it does not require a position in the completion queue and can be folded. 3. In cycle 3, target instructions (6 and 7) are fetched, replacing instructions 4 and 5 in IQ0 and IQ1. This replacement on taken branches is called branch folding. Instruction 1 proceeds through the last of the three FPU execute stages. Instruction 2 has executed but must remain in the completion queue until instruction 1 completes. Instruction 3 replaces instruction 1 in the second stage of the FPU, and instruction 6 replaces instruction 3 in the first stage. Also, as will be shown in cycle 4, there is a single-cycle stall that occurs when the FPU pipeline is full. Because there were three vacancies in the instruction queue in the previous clock cycle, instructions 8-11 are fetched in this clock cycle. 4. Instruction 1 completes in cycle 4, allowing instruction 2 to complete. Instructions 3 and 6 continue through the FPU pipeline. Although instruction 7 is in IQ1, it cannot be dispatched because the FPU is busy, and because instruction 7 cannot be dispatched neither can instruction 8. The additional cycle stall allows the instruction queue to be completely filled. Because there was one opening in the instruction queue in clock cycle 3, one instruction is fetched (12) and the instruction queue is full. 5. In cycle 5, instruction 3 completes, allowing instruction 7 to be dispatched to the FPU, which in turn allows instruction 8 to be dispatched to the IU2. Instructions 9 and 10 drop to the dispatch positions in the instruction queue. No instructions are fetched in this clock cycle because there were no vacant IQ entries in clock cycle 4. 6. In cycle 6, instruction 6 completes, instruction 7 is in stage 2 of the FPU execute stage, and although instruction 8 has executed, it must wait for instruction 7 to complete. The two integer instructions, 9 and 10, are dispatched to the IU2 and IU1, respectively. Fetching resumes with instructions 13 and 14.
MOTOROLA
Chapter 6. Instruction Timing
6-13
Timing Considerations
7. In cycle 7, instruction 7 is in the final FPU execute stage and instructions 8-10 wait in the completion queue. Instructions 11 and 12 are dispatched to the IU2 and FPU, respectively. Note that at this point the completion queue is full. Two more instructions (15 and 16, which are shown only in the instruction queue) are fetched. 8. In cycle 8, instructions 7-11 are through executing. Instructions 7 and 8 complete, write back, and vacate the completion queue. Because the completion queue is full, instructions 13 and 14 cannot be dispatched and must remain in the instruction queue. Only the FPU is executing during this cycle (instruction 12). Additional instructions (instructions 16 and 17, shown only in the instruction queue) are fetched, filling the instruction queue. 9. In cycle 9, two more instructions (instructions 7 and 8) are retired from the completion queue allowing instructions 13 and 14 to be dispatched, again filling the completion queue. No instructions are fetched on this cycle because the instruction queue was full on the previous clock cycle.
6.3.2.3
Cache Miss
Figure 6-6 shows an instruction fetch that misses both the on-chip cache and L2 cache. A processor/bus clock ratio is 2:1 is used. The same instruction sequence is used as in Section 6.3.2.2, "Cache Hit"; however in this example, the branch target instruction is not in either the L1 or L2 cache. Because the target instruction is not in the L1 cache, it cannot be in the BTIC. A cache miss, extends the latency of the fetch stage, so in this example, the fetch stage shown represents not only the time the instruction spends in the IQ, but the time required for the instruction to be loaded from system memory, beginning in clock cycle 2. During clock cycle 3, the target instruction for the b instruction is not in the BTIC, the instruction cache or the L2 cache; therefore, a memory access must occur. During clock cycle 5, the address of the block of instructions is sent to the system bus. During clock cycle 7, two instructions (64 bits) are returned from memory on the first beat and are forwarded both to the cache and the instruction fetcher.
6-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Timing Considerations
0 *** 0 add 1 fadd 2 add 3 fadd 4b 5 fsub Address Data 6 fadd * 7 fadd * 8 add * 9 add * 10 add * 11 add * 12 fadd * 13 fadd * Instruction Queue 3 2 1 0 Completion Queue 3 2 1 0 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
Fetch * In dispatch entry (IQ0/IQ1) Execute Complete (In CQ) In retirement entry (CQ0/CQ1)
7 6
7
9 8
1 0
3 2 1
3 2 1
3
6
7 6
* Instructions 5 and 6 are not in the IQ in clock cycle 5. Here, the fetch stage shows cache latency.
Figure 6-6. Instruction Timing--Cache Miss
6.3.2.4
L2 Cache Access Timing Considerations
If an instruction fetch misses both the BTIC and the on-chip instruction cache, the MPC750 next looks in the L2 cache. (Note that the MPC740 does not implement the L2 cache
MOTOROLA Chapter 6. Instruction Timing 6-15
Timing Considerations
interface.) If the requested instructions are there, they are burst into the MPC750 in much the same way as shown in Figure 6-6. The formula for the L2 cache latency for instruction accesses is as follows: 1 processor clock + 3 L2 clocks + 1 processor clock Therefore, if the L2 is operating in 2:1 mode, the instruction fetch takes 8 processor clock cycles. Additional factors can also affect this latency, including the type of memory used to implement the L2 and whether the processor clock and L2 clocks are aligned immediately. For more information about the L2 cache implementation, see Chapter 9, "L2 Cache Interface Operation."
6.3.3
Instruction Dispatch and Completion Considerations
Several factors affect the MPC750's ability to dispatch instructions at a peak rate of two per cycle--the availability of the execution unit, destination rename registers, and completion queue, as well as the handling of completion-serialized instructions. Several of these limiting factors are illustrated in the previous instruction timing examples. To reduce dispatch unit stalls due to instruction data dependencies, the MPC750 provides a single-entry reservation station for the FPU, SRU, and each IU, and a two-entry reservation station for the LSU. If a data dependency keeps an instruction from starting execution, that instruction is dispatched to the reservation station associated with its execution unit (and the rename registers are assigned), thereby freeing the positions in the instruction queue so instructions can be dispatched to other execution units. Execution begins during the same clock cycle that the rename buffer is updated with the data the instruction is dependent on. If both instructions in IQ0 and IQ1 require the same execution unit, the instruction in IQ1 cannot be dispatched until the first instruction proceeds through the pipeline and provides the subsequent instruction with a vacancy in the requested execution unit. The completion unit maintains program order after instructions are dispatched from the instruction queue, guaranteeing in-order completion and a precise exception model. Completing an instruction implies committing execution results to the architected destination registers. In-order completion ensures the correct architectural state when the MPC750 must recover from a mispredicted branch or an exception. Instruction state and all information required for completion is kept in the six-entry, first-in/first-out completion queue. An completion queue entry is allocated for each instruction when it is dispatched to an execute unit; if no entry is available, the dispatch unit stalls. A maximum of two instructions per cycle may be completed and retired from the completion queue, and the flow of instructions can stall when a longer-latency instruction reaches the last position in the completion queue. Subsequent instructions cannot be completed and retired until that longer-latency instruction completes and retires. Examples of this are shown in Section 6.3.2.2, "Cache Hit," and Section 6.3.2.3, "Cache Miss."
6-16 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Timing Considerations
The MPC750 can execute instructions out-of-order, but in-order completion by the completion unit ensures a precise exception mechanism. Program-related exceptions are signaled when the instruction causing the exception reaches the last position in the completion queue. Prior instructions are allowed to complete before the exception is taken.
6.3.3.1
Rename Register Operation
To avoid contention for a given register file location in the course of out-of-order execution, the MPC750 provides rename registers for holding instruction results before the completion commits them to the architected register. There are six GPR rename registers, six FPR rename registers, and one each for the CR, LR, and CTR. When the dispatch unit dispatches an instruction to its execution unit, it allocates a rename register (or registers) for the results of that instruction. If an instruction is dispatched to a reservation station associated with an execution unit due to a data dependency, the dispatcher also provides a tag to the execution unit identifying the rename register that forwards the required data at completion. When the source data reaches the rename register, execution can begin. Instruction results are transferred from the rename registers to the architected registers by the completion unit when an instruction is retired from the completion queue without exceptions and after any predicted branch conditions preceding it in the completion queue have been resolved correctly. If a branch prediction was incorrect, the instructions following the branch are flushed from the completion queue, and any results of those instructions are flushed from the rename registers.
6.3.3.2
Instruction Serialization
Although the MPC750 can dispatch and complete two instructions per cycle, so-called serializing instructions limit dispatch and completion to one instruction per cycle. There are three types of instruction serialization: * Execution serialization--Execution-serialized instructions are dispatched, held in the functional unit and do not execute until all prior instructions have completed. A functional unit holding an execution-serialized instruction will not accept further instructions from the dispatcher. For example, execution serialization is used for instructions that modify nonrenamed resources. Results from these instructions are generally not available or forwarded to subsequent instructions until the instruction completes (using mtspr to write to LR or CTR does provide forwarding to branch instructions). Completion serialization (also referred to as post-dispatch or tail serialization)--Completion-serialized instructions inhibit dispatching of subsequent instructions until the serialized instruction completes. Completion serialization is used for instructions that bypass the normal rename mechanism.
*
MOTOROLA
Chapter 6. Instruction Timing
6-17
Execution Unit Timings
*
Refetch serialization (flush serialization)--Refetch-serialized instructions inhibit dispatch of subsequent instructions and force refetching of subsequent instructions after completion.
6.4
Execution Unit Timings
The following sections describe instruction timing considerations within each of the respective execution units in the MPC750.
6.4.1
Branch Processing Unit Execution Timing
Flow control operations (conditional branches, unconditional branches, and traps) are typically expensive to execute in most machines because they disrupt normal flow in the instruction stream. When a change in program flow occurs, the IQ must be reloaded with the target instruction stream. Previously issued instructions will continue to execute while the new instruction stream makes its way into the IQ, but depending on whether the target instruction is in the BTIC, instruction cache, L2 cache, or in system memory, some opportunities may be missed to execute instructions, as the example in Section 6.3.2.3, "Cache Miss," shows. Performance features such as the branch folding, removal of fall-through branch instructions, BTIC, dynamic branch prediction (implemented in the BHT), two-level branch prediction, and the implementation of nonblocking caches minimize the penalties associated with flow control operations on the MPC750. The timing for branch instruction execution is determined by many factors including the following: * * * * * Whether the branch is taken Whether instructions in the target stream, typically the first two instructions in the target stream, are in the branch target instruction cache (BTIC) Whether the target instruction stream is in the on-chip cache Whether the branch is predicted Whether the prediction is correct
6.4.1.1
Branch Folding and Removal of Fall-Through Branch Instructions
When a branch instruction is encountered by the fetcher, the BPU immediately begins to decode it and tries to resolve it. All branch instructions except those that update either the LR or CTR are removed from the instruction flow before they would take a position in the completion queue. Branch folding occurs either when a branch is taken or is predicted as taken (as is the case with unconditional branches). When the BPU folds the branch instruction out of the
6-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Execution Unit Timings
instruction stream, the target instruction stream that is fetched into the instruction queue overwrites the branch instruction. Figure 6-7 shows branch folding. Here a br instruction is encountered in a series of add instructions. The branch is resolved as taken. What happens on the next clock cycle depends on whether the target instruction stream is in the BTIC, the instruction cache, or if it must be fetched from the L2 cache or from system memory. Figure 6-7 shows cases where there is a BTIC hit, and when there is a BTIC miss (and instruction cache hit). If there is a BTIC hit on the next clock cycle the b instruction is replaced by the target instruction, and1, that was found in the BTIC; the second and instruction is also fetched from the BTIC. On the next clock cycle, the next four and instructions from the target stream are fetched from the instruction cache. If the target instruction is not in the BTIC, there is an idle cycle while the fetcher attempts to fetch the first four instructions from the instruction cache (on the next clock cycle). In the example in Figure 6-7, the first four target instruction are fetched on the next clock. If it misses in the caches, an L2 cache or memory access is required, the latency of which is dependent on several factors, such as processor/bus clock ratios. In most cases, new instructions arrive in the IQ before the execution units become idle.
Branch Folding (Taken Branch/BTIC Hit) Branch Folding (Taken Branch/BTIC
Clock 0
IQ5 IQ4 IQ3 IQ2 IQ1 IQ0 add5 add4 add3 b add2 add1
Clock 1
Clock 2
IQ5 IQ4 IQ3 IQ2 IQ1 IQ0
Clock 0
add5 add4 add3 b add2 add1
Clock 1
Clock 2
and2 and1
and6 and5 and4 and3
and4 and3 and2 and1
Figure 6-7. Branch Folding
Figure 6-8 shows the removal of fall-through branch instructions, which occurs when a branch is not taken or is predicted as not taken.
Branch Fall-Through (Not-Taken Branch)
Clock 0
IQ5 IQ4 IQ3 IQ2 IQ1 IQ0 add5 add4 add3 b add2 add1
Clock 1
Clock 2
add5 add4 add3 b
add7 add6 add5 add4
Figure 6-8. Removal of Fall-Through Branch Instruction
MOTOROLA
Chapter 6. Instruction Timing
6-19
Execution Unit Timings
In this case the branch instruction remains in the instruction queue and is removed from the instruction stream as if it were dispatched. However, it is not dispatched to an execution unit and is not assigned an entry in the completion queue. When a branch instruction is detected before it reaches a dispatch position, and if the branch is correctly predicted as taken, folding the branch instruction (and any instructions from the incorrect path) reduces the latency required for flow control to zero; instruction execution proceeds as though the branch was never there. The advantage of removing the fall-through branch instructions at dispatch is only marginally less than that of branch folding. Because the branch is not taken, only the branch instruction needs to be discarded. The only cost of expelling the branch instruction from one of the dispatch entries rather than folding it is missing a chance to dispatch an executable instruction from that position.
6.4.1.2
Branch Instructions and Completion
As described in the previous section, instructions that do not update either the LR or CTR are removed from the instruction stream before they reach the completion queue, either by branch folding (in the case of taken branches) or by removing fall-through branch instructions at dispatch (in the case of non-taken branches). However, branch instructions that update the architected LR and CTR must do so in program order and therefore must perform write-back in the completion stage, like the instructions that update the FPRs and GPRs. Branch instructions that update the CTR or LR pass through the instruction queue like nonbranch instructions. At the point of dispatch, however, they are not sent to an execution unit, but rather are assigned a slot in the completion queue, as shown in Figure 6-9.
Branch Completion (LR/CTR Write-Back) Clock 0 IQ5 IQ4 IQ3 IQ2 IQ1 IQ0 add5 add4 add3 bc add2 add1 Clock 1 Clock 2 Clock 3
add5 add4 add3 bc
add7 add6 add5 add4
add9 add8 add7 add6
CQ5 CQ4 CQ3 CQ2 CQ1 CQ0
add2 add1
add3 bc
add5 add4
Figure 6-9. Branch Completion
6-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Execution Unit Timings
In this example, the bc instruction is encoded to decrement the CTR. It is predicted as not-taken in clock cycle 0. In clock cycle 2, bc and add3 are both dispatched. In clock cycle 3, the architected CTR is updated and the bc instruction is retired from the completion queue.
6.4.1.3
* *
Branch Prediction and Resolution
The MPC750 supports the following two types of branch prediction: Static branch prediction--This is defined by the PowerPC architecture as part of the encoding of branch instructions. Dynamic branch prediction--This is a processor-specific mechanism implemented in hardware (in particular the branch history table, or BHT) that monitors branch instruction behavior and maintains a record from which the next occurrence of the branch instruction is predicted.
When a conditional branch cannot be resolved due to a CR data dependency, the BPU predicts whether it will be taken, and instruction fetching proceeds down the predicted path. If the branch prediction resolves as incorrect, the instruction queue and all subsequently executed instructions are purged, instructions executed prior to the predicted branch are allowed to complete, and instruction fetching resumes down the correct path. The MPC750 executes through two levels of prediction. Instructions from the first unresolved branch can execute, but they cannot complete until the branch is resolved. If a second branch instruction is encountered in the predicted instruction stream, it can be predicted and instructions can be fetched, but not executed, from the second branch. No action can be taken for a third branch instruction until at least one of the two previous branch instructions is resolved. The number of instructions that can be executed after the issue of a predicted branch instruction is limited by the fact that no instruction executed after a predicted branch may actually update the register files or memory until the branch is completed. That is, instructions may be issued and executed, but cannot reach the write-back stage in the completion unit. When an instruction following a predicted branch completes execution, it does not write back its results to the architected registers, instead, it stalls in the completion queue. Of course, when the completion queue is full, no additional instructions can be dispatched, even if an execution unit is idle. In the case of a misprediction, the MPC750 can easily redirect its machine state because the programming model has not been updated. When a branch is mispredicted, all instructions that were dispatched after the predicted branch instruction are flushed from the completion queue and any results are flushed from the rename registers. The BTIC is a cache of recently used branch target instructions. If the search for the branch target hits in the cache, the first one or two branch instructions is available in the instruction queue on the next cycle (shown in Figure 6-5). Two instructions are fetched on a BTIC hit,
MOTOROLA
Chapter 6. Instruction Timing
6-21
Execution Unit Timings
unless the branch target is the last instruction in a cache block, in which case one instruction is fetched. In some situations, an instruction sequence creates dependencies that keep a branch instruction from being resolved immediately, thereby delaying execution of the subsequent instruction stream based on the predicted outcome of the branch instruction. The instruction sequences and the resulting action of the branch instruction are described as follows: * * * * An mtspr(LK) followed by a bclr--Fetching stops and the branch waits for the mtspr to execute. An mtspr(CTR) followed by a bcctr--Fetching stops and the branch waits for the mtspr to execute. An mtspr(CTR) followed by a bc (CTR decrement)--Fetching stops and the branch waits for the mtspr to execute. A third bc(based-on-CR) is encountered while there are two unresolved bc(based-on-CR). The third bc(based-on-CR) is not executed and fetching stops until one of the previous bc(based-on-CR) is resolved. (Note that branch conditions can be a function of the CTR and the CR; if the CTR condition is sufficient to resolve the branch, then a CR-dependency is ignored.) Static Branch Prediction
6.4.1.3.1
The PowerPC architecture provides a field in branch instructions (the BO field) to allow software to hint whether a branch is likely to be taken. Rather than delaying instruction processing until the condition is known, the MPC750 uses the instruction encoding to predict whether the branch is likely to be taken and begins fetching and executing along that path. When the branch condition is known, the prediction is evaluated. If the prediction was correct, program flow continues along that path; otherwise, the processor flushes any instructions and their results from the mispredicted path, and program flow resumes along the correct path. Static branch prediction is used when HID0[BHT] is cleared. That is, the branch history table, which is used for dynamic branch prediction, is disabled. For information about static branch prediction, see "Conditional Branch Control," in Chapter 4, "Addressing Modes and Instruction Set Summary," in the Programming Environments Manual. 6.4.1.3.2 Predicted Branch Timing Examples
Figure 6-10 shows cases where branch instructions are predicted. It shows how both taken and not-taken branches are handled and how the MPC750 handles both correct and incorrect predictions. The example shows the timing for the following instruction sequence:
0 1 2 3 4 6-22 add add bc mulhw bc T0 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Execution Unit Timings 5 6 add T7 T8 T9 T10 T11 fadd and add add add add or
0 *** 0 add 1 add 2 bc 3 mulhw 4 bc 5 fadd T0 add T1 add T2 add T3 add T4 and T5 or 5 fadd * 6 and* *** Instruction Queue 3 2 (bc) 1 0 5 4 3 2 T5 T4 T3 T2 T5 T4 T3 T2 (8) (7) 6 5 In retirement entry (CQ0/CQ1) 1 2 3 4 5 6 7 8 9 10
Fetch In dispatch entry (IQ0/IQ1) Predict Execute Complete (In CQ)
T1 T0
Completion Queue 3 2 1 0 T1 T0 3 2 (8) (7) 6 5 (8) (7) 6 5 (8) (7) 6 5
1 0
T1 T0 3
6 5
* Instructions 5 and 6 are not in the IQ in clock cycle 5. Here, the fetch stage shows cache latency.
Figure 6-10. Branch Instruction Timing
MOTOROLA
Chapter 6. Instruction Timing
6-23
Execution Unit Timings
0. During clock cycle 0, instructions 0 and 1 are dispatched to their respective execution units. Instruction 2 is a branch instruction that updates the CTR. It is predicted as not taken in clock cycle 0. Instruction 3 is a mulhw instruction on which instruction 4 depends. 1. In clock cycle 1, instructions 2 and 3 enter the dispatch entries in the IQ. Instruction 4 (a second bc instruction) and 5 are fetched. The second bc instruction is predicted as taken. It can be folded, but it cannot be resolved until instruction 3 writes back. 2. In clock cycle 2, instruction 4 has been folded and instruction 5 has been flushed from the IQ. The two target instructions, T0 and T1, are both in the BTIC, so they are fetched in this cycle. Note that even though the first bc instruction may not have resolved by this point (we can assume it has), the MPC750 allows fetching from a second predicted branch stream. However, these instructions could not be dispatched until the previous branch has resolved. 3. In clock cycle 3, target instructions T2-T5 are fetched as T0 and T1 are dispatched. 4. In clock cycle 4, instruction 3, on which the second branch instruction depended, writes back and the branch prediction is proven incorrect. Even though T0 is in CQ1, from which it could be written back, it is not written back because the branch prediction was incorrect. All target instructions are flushed from their positions in the pipeline at the end of this clock cycle, as are any results in the rename registers. After one clock cycle required to refetch the original instruction stream, instruction 5, the same instruction that was fetched in clock cycle 1, is brought back into the IQ from the instruction cache, along with three others (not all of which are shown).
6.4.2
Integer Unit Execution Timing
The MPC750 has two integer units. The IU1 can execute all integer instructions; and the IU2 can execute all integer instructions except multiply and divide instructions. As shown in Figure 6-2, each integer unit has one execute pipeline stage, thus when a multicycle integer instruction is being executed, no other integer instructions can begin to execute. Table 6-6 lists integer instruction latencies. Most integer instructions have an execution latency of one clock cycle.
6.4.3
Floating-Point Unit Execution Timing
The floating-point unit on the MPC750 executes all floating-point instructions. Execution of most floating-point instructions is pipelined within the FPU, allowing up to three instructions to be executing in the FPU concurrently. While most floating-point instructions execute with three- or four-cycle latency, and one- or two-cycle throughput, three instructions (fdivs, fdiv, and fres) execute with latencies of 11 to 33 cycles. The fdivs, fdiv, fres, mtfsb0, mtfsb1, mtfsfi, mffs, and mtfsf instructions block the floating-point unit
6-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Execution Unit Timings
pipeline until they complete execution, and thereby inhibit the dispatch of additional floating-point instructions. See Table 6-7 for floating-point instruction execution timing.
6.4.4
Effect of Floating-Point Exceptions on Performance
For the fastest and most predictable floating-point performance, all exceptions should be disabled in the FPSCR and MSR.
6.4.5
Load/Store Unit Execution Timing
The execution of most load and store instructions is pipelined. The LSU has two pipeline stages. The first is for effective address calculation and MMU translation and the second is for accessing data in the cache. Load and store instructions have a two-cycle latency and one-cycle throughput. If operands are misaligned, additional latency may be required either for an alignment exception to be taken or for additional bus accesses. Load instructions that miss in the cache block subsequent cache accesses during the cache line refill. Table 6-8 gives load and store instruction execution latencies.
6.4.6
Effect of Operand Placement on Performance
The PowerPC VEA states that the placement (location and alignment) of operands in memory may affect the relative performance of memory accesses, and in some cases affect it significantly. The effects memory operand placement has on performance are shown in Table 6-1. The best performance is guaranteed if memory operands are aligned on natural boundaries. For the best performance across the widest range of implementations, the programmer should assume the performance model described in Chapter 3, "Operand Conventions," in the Programming Environments Manual. The effect of misalignment on memory access latency is the same for big- and little-endian addressing modes except for multiple and string operations that cause an alignment exception in little-endian mode.
Table 6-1. Performance Effects of Memory Operand Placement
Operand Size Byte Alignment None 8 Byte Integer 4 byte 4 <4 Optimal 1 Optimal -- Good -- Good -- Good Boundary Crossing Cache Block Protection Boundary
MOTOROLA
Chapter 6. Instruction Timing
6-25
Execution Unit Timings
Table 6-1. Performance Effects of Memory Operand Placement (continued)
Operand Size 2 byte Byte Alignment 2 <2 1 byte lmw, stmw 2 String
2
Boundary Crossing None Optimal Optimal Optimal Good
3
8 Byte -- Good -- Good Poor Good Floating-Point
Cache Block -- Good -- Good Poor Good
Protection Boundary -- Good -- Good Poor Good
1 4 <4 --
Poor 4 Good
8 byte
8 4 <4
Optimal -- -- Optimal Poor
-- Good Poor -- Poor
-- Good Poor -- Poor
-- Good Poor -- Poor
4 byte
4 <4
Notes:
1 2
Optimal means one EA calculation occurs. Not supported in little-endian mode, causes an alignment exception. 3 Good means multiple EA calculations occur that may cause additional bus activities with multiple bus transfers. 4 Poor means that an alignment exception occurs.
6.4.7
Integer Store Gathering
The MPC750 performs store gathering for write-through operations to nonguarded space. It performs cache-inhibited stores to nonguarded space for 4-byte, word-aligned stores. These stores are combined in the LSU to form a double word and are sent out on the 60x bus as a single-beat operation. However, stores are gathered only if the successive stores meet the criteria and are queued and pending. Store gathering occurs regardless of the address order of the stores. Store gathering is enabled by setting HID0[SGE]. Stores can be gathered in both endian modes. Store gathering is not done for the following: * * * * * * *
6-26
Cacheable store operations Stores to guarded cache-inhibited or write-through space Byte-reverse store operations stwcx. instructions ecowx instructions A store that occurs during a table search operation Floating-point store operations
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Memory Performance Considerations
If store gathering is enabled and the stores do not fall under the above categories, an eieio or sync instruction must be used to prevent two stores from being gathered.
6.4.8
System Register Unit Execution Timing
Most instructions executed by the SRU either directly access renamed registers or access or modify nonrenamed registers. They generally execute in a serial manner. Results from these instructions are not available to subsequent instructions until the instruction completes and is retired. See Section 6.3.3.2, "Instruction Serialization," for more information on serializing instructions executed by the SRU, and refer to Table 6-4 and Table 6-5 for SRU instruction execution timings.
6.5
Memory Performance Considerations
Because the MPC750 can have a maximum instruction throughput of three instructions per clock cycle, lack of memory bandwidth can affect performance. For the MPC750 to maximize performance, it must be able to read and write data efficiently. If a system has multiple bus devices, one of them may experience long memory latencies while another bus master (for example, a DMA controller) is using the external bus.
6.5.1
Caching and Memory Coherency
To minimize the effect of bus contention, the PowerPC architecture defines WIM bits that are used to configure memory regions as caching-enforced or caching-inhibited. Accesses to such memory locations never update the on-chip cache. If a cache-inhibited access hits the on-chip cache, the cache block is invalidated. If the cache block is marked modified, it is copied back to memory before being invalidated. Where caching is permitted, memory is configured as either write-back or write-through, which are described as follows: * Write-back-- Configuring a memory region as write-back lets a processor modify data in the cache without updating system memory. For such locations, memory updates occur only on modified cache block replacements, cache flushes, or when one processor needs data that is modified in another's cache. Therefore, configuring memory as write-back can help when bus traffic could cause bottlenecks, especially for multiprocessor systems and for regions in which data, such as local variables, is used often and is coupled closely to a processor. If multiple devices use data in a memory region marked write-through, snooping must be enabled to allow the copy-back and cache invalidation operations necessary to ensure cache coherency. The MPC750's snooping hardware keeps other devices from accessing invalid data. For example, when snooping is enabled, the MPC750 monitors transactions of other bus devices. For example, if another device needs data that is modified on the MPC750's cache, the access is delayed so the MPC750 can copy the modified data to memory.
MOTOROLA
Chapter 6. Instruction Timing
6-27
Instruction Scheduling Guidelines
*
Write-through--Store operations to memory marked write-through always update both system memory and the on-chip cache on cache hits. Because valid cache contents always match system memory marked write-through, cache hits from other devices do not cause modified data to be copied back as they do for locations marked write-back. However, all write operations are passed to the bus, which can limit performance. Load operations that miss the on-chip cache must wait for the external store operation. Write-through configuration is useful when cached data must agree with external memory (for example, video memory), when shared (global) data may be needed often, or when it is undesirable to allocate a cache block on a cache miss.
Chapter 3, "L1 Instruction and Data Cache Operation," describes the caches, memory configuration, and snooping in detail.
6.5.2
Effect of TLB Miss
If a page address translation is not in a TLB, the MPC750 hardware searches the page tables and updates the TLB when a translation is found. Table 6-2 shows the estimated latency for the hardware TLB load for different cache configurations and conditions.
Table 6-2. TLB Miss Latencies
L1 Condition (Instruction and Data) 100% cache hit 100% cache miss 100% cache miss 100% cache miss 100% cache miss 100% cache miss L2 Condition -- 100% cache hit 100% cache hit 100% cache hit 100% cache miss 100% cache miss Processor/L2 Clock Ratio -- 1:1 1.5:1 2:1 1:1 1:1 Processor/System Bus Clock Ratio -- -- -- -- 2.5:1 (6:3:3:3 memory) 4:1 (5:2:2:2 memory) Estimated Latency (Cycles) 7 13 18 20 62 77
The PTE table search assumes a hit in the first entry of the primary PTEG.
6.6
Instruction Scheduling Guidelines
The performance of the MPC750 can be improved by avoiding resource conflicts and scheduling instructions to take fullest advantage of the parallel execution units. Instruction scheduling on the MPC750 can be improved by observing the following guidelines: * To reduce mispredictions, separate the instruction that sets CR bits from the branch instruction that evaluates them. Because there can be no more than 12 instructions in the processor (with the instruction that sets CR in CQ0 and the dependent branch instruction in IQ5), there is no advantage to having more than 10 instructions between them.
6-28
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Scheduling Guidelines
*
* * *
* *
Likewise, when branching to a location specified by the CTR or LR, separate the mtspr instruction that initializes the CTR or LR from the dependent branch instruction. This ensures the register values are immediately available to the branch instruction. Schedule instructions such that two can be dispatched at a time. Schedule instructions to minimize stalls due to execution units being busy. Avoid scheduling high-latency instructions close together. Interspersing single-cycle latency instructions between longer-latency instructions minimizes the effect that instructions such as integer divide and multiply can have on throughput. Avoid using serializing instructions. Schedule instructions to avoid dispatch stalls: -- Six instructions can be tracked in the completion queue; therefore, only six instructions can be in the execute stages at any one time -- There are six GPR rename registers; therefore only six GPRs can be specified as destination operands at any time. If no rename registers are available, instructions cannot enter the execute stage and remain in the reservation station or instruction queue until they become available. Note that load with update address instructions use two destination registers -- Similarly, there are six FPR rename registers, so only six FPR destination operands can be in the execute and complete stages at any time.
6.6.1
Branch, Dispatch, and Completion Unit Resource Requirements
This section describes the specific resources required to avoid stalls during branch resolution, instruction dispatching, and instruction completion.
6.6.1.1
Branch Resolution Resource Requirements
The following is a list of branch instructions and the resources required to avoid stalling the fetch unit in the course of branch resolution: * * * * The bclr instruction requires LR availability. The bcctr instruction requires CTR availability. Branch and link instructions require shadow LR availability. The "branch conditional on counter decrement and the CR" condition requires CTR availability or the CR condition must be false, and the MPC750 cannot execute instructions after an unresolved predicted branch when the BPU encounters a branch. A branch conditional on CR condition cannot be executed following an unresolved predicted branch instruction.
Chapter 6. Instruction Timing 6-29
*
MOTOROLA
Instruction Scheduling Guidelines
6.6.1.2
Dispatch Unit Resource Requirements
The following is a list of resources required to avoid stalls in the dispatch unit. IQ[0] and IQ[1] are the two dispatch entries in the instruction queue: * Requirements for dispatching from IQ[0] are as follows: -- Needed execution unit available -- Needed GPR rename registers available -- Needed FPR rename registers available -- Completion queue is not full. -- A completion-serialized instruction is not being executed. Requirements for dispatching from IQ[1] are as follows: -- Instruction in IQ[0] must dispatch. -- Instruction dispatched by IQ[0] is not completion- or refetch-serialized. -- Needed execution unit is available (after dispatch from IQ[0]). -- Needed GPR rename registers are available (after dispatch from IQ[0]). -- Needed FPR rename register is available (after dispatch from IQ[0]). -- Completion queue is not full (after dispatch from IQ[0]).
*
6.6.1.3
Completion Unit Resource Requirements
The following is a list of resources required to avoid stalls in the completion unit; note that the two completion entries are described as CQ[0] and CQ[1], where CQ[0] is the completion queue located at the end of the completion queue (see Figure 6-4). * Requirements for completing an instruction from CQ[0] are as follows: -- Instruction in CQ[0] must be finished. -- Instruction in CQ[0] must not follow an unresolved predicted branch. -- Instruction in CQ[0] must not cause an exception. Requirements for completing an instruction from CQ[1] are as follows: -- Instruction in CQ[0] must complete in same cycle. -- Instruction in CQ[1] must be finished. -- Instruction in CQ[1] must not follow an unresolved predicted branch. -- Instruction in CQ[1] must not cause an exception. -- Instruction in CQ[1] must be an integer or load instruction. -- Number of CR updates from both CQ[0] and CQ[1] must not exceed two. -- Number of GPR updates from both CQ[0] and CQ[1] must not exceed two. -- Number of FPR updates from both CQ[0] and CQ[1] must not exceed two.
*
6-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Latency Summary
6.7
Instruction Latency Summary
Table 6-3 through Table 6-8 list latencies associated with instructions executed by each execution unit. Table 6-3 describes branch instruction latencies.
Table 6-3. Branch Instructions
Mnemonic b[l][a] bc[l][a] bcctr[l] bclr[l] Primary 18 16 19 19 Extended -- -- 528 16 Latency Unless these instructions update either the CTR or the LR, branch operations are folded if they are either taken or predicted as taken. They fall through if they are not taken or predicted as not taken.
Table 6-4 lists system register instruction latencies.
Table 6-4. System Register Instructions
Mnemonic eieio isync mfmsr mfspr (DBATs) mfspr (IBATs) mfspr (not I/DBATs) mfsr mfsrin mftb mtmsr mtspr (DBATs) mtspr (IBATs) mtspr (not I/DBATs) mtsr mtsrin mttb rfi sc Primary 31 19 31 31 31 31 31 31 31 31 31 31 31 31 31 31 19 17 Extended 854 150 83 339 339 339 595 659 371 146 467 467 467 210 242 467 50 - -1 Unit SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU Cycles 1 2 1 3 3 1 3 3 1 1 2 2 2 2 2 1 2 2 -- Completion, refetch -- Execution -- Execution -- Execution -- Execution Execution Execution Execution Execution Execution Execution Completion, refetch Completion, refetch Serialization
MOTOROLA
Chapter 6. Instruction Timing
6-31
Instruction Latency Summary
Table 6-4. System Register Instructions (continued)
Mnemonic sync tlbsync 2 Notes:
1
Primary 31 31
Extended 598 566
Unit SRU --
Cycles 31 -- --
Serialization
This assumes no pending stores in the store queue. If there are, the sync completes after they complete to memory. If broadcast is enabled on the 60x bus, sync completes only after a successful broadcast. tlbsync is dispatched only to the completion buffer (not to any execution unit) and is marked finished as it is dispatched. Upon retirement, it waits for an external TLBISYNC signal to be asserted. In most systems TLBISYNC is always asserted so the instruction is a no-op.
2
Table 6-5 lists condition register logical instruction latencies.
Table 6-5. Condition Register Logical Instructions
Mnemonic crand crandc creqv crnand crnor cror crorc crxor mcrf mcrxr mfcr mtcrf Primary 19 19 19 19 19 19 19 19 19 31 31 31 Extended 257 129 289 225 33 449 417 193 0 512 19 144 Unit SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU SRU Cycles 1 1 1 1 1 1 1 1 1 1 1 1 Serialization Execution Execution Execution Execution Execution Execution Execution Execution Execution Execution Execution Execution
Table 6-6 shows integer instruction latencies. Note that the IU1 executes all integer arithmetic instructions--multiply, divide, shift, rotate, arithmetic, and compare. The IU2 executes all integer instructions except multiply and divide (that is, shift, rotate, logical, and compare).
Table 6-6. Integer Instructions
Mnemonic addc[o][.] adde[o][.] addi addic Primary 31 31 14 12 Extended 10 138 -- -- Unit IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 1 1 1 1 Cycles -- Execution -- -- Serialization
6-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Latency Summary
Table 6-6. Integer Instructions (continued)
Mnemonic addic. addis addme[o][.] addze[o][.] add[o][.] andc[.] andi. andis. and[.] cmp cmpi cmpl cmpli cntlzw[.] divwu[o][.] divw[o][.] eqv[.] extsb[.] extsh[.] mulhwu[.] mulhw[.] mulli mull[o][.] nand[.] neg[o][.] nor[.] orc[.] ori oris or[.] rlwimi[.] rlwinm[.] rlwnm[.] slw[.] Primary 13 15 31 31 31 31 28 29 31 31 11 31 10 31 31 31 31 31 31 31 31 7 31 31 31 31 31 24 25 31 20 21 23 31 Extended -- -- 234 202 266 60 -- -- 28 0 -- 32 -- 26 459 491 284 954 922 11 75 -- 235 476 104 124 412 -- -- 444 -- -- -- 24 Unit IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1 IU1 IU1/IU2 IU1/IU2 IU1/IU2 IU1 IU1 IU1 IU1 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 19 19 1 1 1 2,3,4,5,6 2,3,4,5 2,3 2,3,4,5 1 1 1 1 1 1 1 1 1 1 1 Cycles -- -- Execution Execution -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Serialization
MOTOROLA
Chapter 6. Instruction Timing
6-33
Instruction Latency Summary
Table 6-6. Integer Instructions (continued)
Mnemonic srawi[.] sraw[.] srw[.] subfc[o][.] subfe[o][.] subfic subfme[o][.] subfze[o][.] subf[.] tw twi xori xoris xor[.] Primary 31 31 31 31 31 8 31 31 31 31 3 26 27 31 Extended 824 792 536 8 136 -- 232 200 40 4 -- -- -- 316 Unit IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 IU1/IU2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 Cycles -- -- -- -- Execution -- Execution Execution -- -- -- -- -- -- Serialization
Table 6-7 shows latencies for floating-point instructions. Pipelined floating-point instructions are shown with number of clocks in each pipeline stage separated by dashes. Floating-point instructions with a single entry in the cycles column are not pipelined; when the FPU executes these nonpipelined instructions, it remains busy for the full duration of the instruction execution and is not available for subsequent instructions.
Table 6-7. Floating-Point Instructions
Mnemonic fabs[.] fadds[.] fadd[.] fcmpo fcmpu fctiwz[.] fctiw[.] fdivs[.] fdiv[.] fmadds[.] fmadd[.] fmr[.] Primary 63 59 63 63 63 63 63 59 63 59 63 63 Extended 264 21 21 32 0 15 14 18 18 29 29 72 Unit FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU Cycles 1-1-1 1-1-1 1-1-1 1-1-1 1-1-1 1-1-1 1-1-1 17 31 1-1-1 2-1-1 1-1-1 -- -- -- -- -- -- -- -- -- -- -- -- Serialization
6-34
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Latency Summary
Table 6-7. Floating-Point Instructions (continued)
Mnemonic fmsubs[.] fmsub[.] fmuls[.] fmul[.] fnabs[.] fneg[.] fnmadds[.] fnmadd[.] fnmsubs[.] fnmsub[.] fres[.] frsp[.] frsqrte[.] fsel[.] fsubs[.] fsub[.] mcrfs mffs[.] mtfsb0[.] mtfsb1[.] mtfsfi[.] mtfsf[.] Primary 59 63 59 63 63 63 59 63 59 63 59 63 63 63 59 63 63 63 63 63 63 63 Extended 28 28 25 25 136 40 31 31 30 30 24 12 26 23 20 20 64 583 70 38 134 711 Unit FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU Cycles 1-1-1 2-1-1 1-1-1 2-1-1 1-1-1 1-1-1 1-1-1 2-1-1 1-1-1 2-1-1 10 1-1-1 1-1-1 1-1-1 1-1-1 1-1-1 1-1-1 1-1-1 3 3 3 3 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Execution Execution -- -- -- -- Serialization
Table 6-8 shows load and store instruction latencies. Pipelined load/store instructions are shown with cycles of total latency and throughput cycles separated by a colon.
Table 6-8. Load and Store Instructions
Mnemonic dcbf dcbi dcbst dcbt dcbtst dcbz eciwx Primary 31 31 31 31 31 31 31 Extended 86 470 54 278 246 1014 310 Unit LSU LSU LSU LSU LSU LSU LSU Cycles 3:5 1 3:31 3:51 2:1 2:1 3:61, 2 2:1 Serialization Execution Execution Execution -- -- Execution --
MOTOROLA
Chapter 6. Instruction Timing
6-35
Instruction Latency Summary
Table 6-8. Load and Store Instructions (continued)
Mnemonic ecowx icbi lbz lbzu lbzux lbzx lfd lfdu lfdux lfdx lfs lfsu lfsux lfsx lha lhau lhaux lhax lhbrx lhz lhzu lhzux lhzx lmw lswi lswx lwarx lwbrx lwz lwzu lwzux lwzx stb stbu Primary 31 31 34 35 31 31 50 51 31 31 48 49 31 31 42 43 31 31 31 40 41 31 31 46 31 31 31 31 32 33 31 31 38 39 Extended 438 982 -- -- 119 87 -- -- 631 599 -- -- 567 535 -- -- 375 343 790 -- -- 311 279 -- 597 533 20 534 -- -- 55 23 -- -- Unit LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU 2:1 3:41 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2+n 3 2+n
3
Cycles --
Serialization
Execution -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Completion, execution Completion, execution Completion, execution Execution -- -- -- -- -- -- --
2+n3 3:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1
6-36
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Latency Summary
Table 6-8. Load and Store Instructions (continued)
Mnemonic stbux stbx stfd stfdu stfdux stfdx stfiwx stfs stfsu stfsux stfsx sth sthbrx sthu sthux sthx stmw stswi stswx stw stwbrx stwcx. stwu stwux stwx tlbie Notes:
1
Primary 31 31 54 55 31 31 31 52 53 31 31 44 31 45 31 31 47 31 31 36 31 31 37 31 31 31
Extended 247 215 -- -- 759 727 983 -- -- 695 663 -- 918 -- 439 407 -- 725 661 -- 662 150 -- 183 151 306
Unit LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU LSU 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1 2:1
Cycles -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
3
Serialization
2+n
Execution Execution Execution -- -- Execution -- -- -- Execution
2+n3 2+n 2:1 2:1 8:8 2:1 2:1 2:1 3:41
3
For cache-ops, the first number indicates the latency in finishing a single instruction; the second indicates the throughput for back-to-back cache-ops. Throughput may be larger than the initial latency as more cycles may be needed to complete the instruction to the cache, which stays busy keeping subsequent cache-ops from executing. 2 The throughput number of 6 cycles for dcbz assumes it is to nonglobal (M = 0) address space. For global address space, throughput is at least 11 cycles. 3 Load/store multiple/string instruction cycles are represented as a fixed number of cycles plus a variable number of cycles, where n is the number of words accessed by the instruction.
MOTOROLA
Chapter 6. Instruction Timing
6-37
Instruction Latency Summary
6-38
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 7 Signal Descriptions
This chapter describes the MPC750 microprocessor's external signals. It contains a concise description of individual signals, showing behavior when the signal is asserted and negated and when the signal is an input and an output. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor." NOTE A bar over a signal name indicates that the signal is active low--for example, ARTRY (address retry) and TS (transfer start). Active-low signals are referred to as asserted (active) when they are low and negated when they are high. Signals that are not active low, such as AP[0-3] (address bus parity signals) and TT[0-4] (transfer type signals) are referred to as asserted when they are high and negated when they are low. The MPC750 signals are grouped as follows: * * * * Address arbitration--The MPC750 uses these signals to arbitrate for address bus mastership. Address transfer start--These signals indicate that a bus master has begun a transaction on the address bus. Address transfer--These signals include the address bus and address parity signals. They are used to transfer the address and to ensure the integrity of the transfer. Transfer attribute--These signals provide information about the type of transfer, such as the transfer size and whether the transaction is bursted, write-through, or cache-inhibited. Address transfer termination--These signals are used to acknowledge the end of the address phase of the transaction. They also indicate whether a condition exists that requires the address phase to be repeated. Data arbitration--The MPC750 uses these signals to arbitrate for data bus mastership. Data transfer--These signals, which consist of the data bus and data parity, are used to transfer the data and to ensure the integrity of the transfer.
*
* *
MOTOROLA
Chapter 7. Signal Descriptions
7-1
Signal Configuration
*
* * *
*
* *
Data transfer termination--Data termination signals are required after each data beat in a data transfer. In a single-beat transaction, the data termination signals also indicate the end of the tenure; while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat. They also indicate whether a condition exists that requires the data phase to be repeated. L2 cache address/data--The MPC750 has separate address and data buses for accessing the L2 cache (not supported in the MPC740). L2 cache clock/control--These signals provide clocking and control for the L2 cache (not supported in the MPC740). Interrupts/resets--These signals include the external interrupt signal, checkstop signals, and both soft reset and hard reset signals. They are used to interrupt and, under various conditions, to reset the processor. Processor status and control--These signals are used to set the reservation coherency bit, enable the time base, and other functions. They are also used in conjunction with such resources as secondary caches and the time base facility. Clock control--These signals determine the system clock frequency. They can also be used to synchronize multiprocessor systems. Test interface--The JTAG (IEEE 1149.1a-1993) interface and the common on-chip processor (COP) unit provide a serial interface to the system for performing board-level boundary-scan interconnect tests.
7.1
Signal Configuration
Figure 7-1 illustrates the MPC750's signal configuration, showing how the signals are grouped. A pinout showing pin numbers is included in the MPC750 hardware specifications.
7-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
L2VDD L2AVDD Not supported in the MPC740 BR BG ABB 1 1 1 17 64 8 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 64 8 1 1 1 1 1 1 1 1 1 1 4 1 5 3 L2ADDR[16-0] L2DATA[0-63] L2DP[0-7] L2CE L2WE L2CLK_OUT[A-B] L2SYNC_OUT L2SYNC_IN L2ZZ INT SMI MCP SRESET HRESET CKSTP_IN CKSTP_OUT RSRV TBEN TLBISYNC QREQ QACK SYSCLK PLL_CFG[0-3] CLK_OUT JTAG/COP Factory Test L2 Cache Address/ Data
Address Arbitration
Address Start
TS
1
Address Bus
A[0-31] AP[0-3] TT[0-4] TBST TSIZ[0-2] GBL WT CI AACK ARTRY DBG DBWO DBB D[0-63] DP[0-7] DBDIS TA DRTRY TEA
32 4
L2 Cache Clock/ Control
Transfer Attributes
5 1 3 1 1 1
Interrupts/ Resets
MPC750
Address Termination
Data Arbitration
Processor Status/ Control
Data Transfer
Clock Control
Data Termination
Test Interface
VDD VDD (I/O) AVDD
Figure 7-1. MPC750 Signal Groups
7.2
Signal Descriptions
This section describes individual MPC750 signals, grouped according to Figure 7-1. Note that the following sections summarize signal functions. Chapter 8, "System Interface Operation," describes many of these signals in greater detail, both with respect to how individual signals function and how groups of signals interact.
MOTOROLA
Chapter 7. Signal Descriptions
7-3
Signal Descriptions
7.2.1
Address Bus Arbitration Signals
The address arbitration signals are input and output signals the MPC750 uses to request the address bus, recognize when the request is granted, and indicate to other devices when mastership is granted. For a detailed description of how these signals interact, see Section 8.3.1, "Address Bus Arbitration."
7.2.1.1
Bus Request (BR)--Output
Asserted--Indicates that the MPC750 is requesting mastership of the address bus. Note that BR may be asserted for one or more cycles, and then de-asserted due to an internal cancellation of the bus request (for example, due to a load hit in the touch load buffer). See Section 8.3.1, "Address Bus Arbitration." Negated--Indicates that the MPC750 is not requesting the address bus. The MPC750 may have no bus operation pending, it may be parked, or the ARTRY input was asserted on the previous bus clock cycle.
Following are the state meaning and timing comments for the BR output signal. State Meaning
Timing Comments Assertion--Occurs when the MPC750 is not parked and a bus transaction is needed. This may occur even if the two possible pipeline accesses have occurred. BR will also be asserted for one cycle during the execution of a dcbz instruction, and during the execution of a load instruction which hits in the touch load buffer. Negation--Occurs for at least one bus clock cycle after an accepted, qualified bus grant (see BG and ABB), even if another transaction is pending. It is also negated for at least one bus clock cycle when the assertion of ARTRY is detected on the bus.
7.2.1.2
Bus Grant (BG)--Input
Asserted--Indicates that the MPC750 may, with proper qualification, assume mastership of the address bus. A qualified bus grant occurs when BG is asserted and ABB and ARTRY are not asserted the bus cycle following the assertion of AACK. The ABB and ARTRY signals are driven by the MPC750 or other bus masters. If the MPC750 is parked, BR need not be asserted for the qualified bus grant. See Section 8.3.1, "Address Bus Arbitration." Negated-- Indicates that the MPC750 is not the next potential address bus master.
Following are the state meaning and timing comments for the BG input signal. State Meaning
7-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
Timing Comments Assertion--May occur at any time to indicate the MPC750 can use the address bus. After the MPC750 assumes bus mastership, it does not check for a qualified bus grant again until the cycle during which the address bus tenure completes (assuming it has another transaction to run). The MPC750 does not accept a BG in the cycles between the assertion of any TS and AACK. Negation--May occur at any time to indicate the MPC750 cannot use the bus. The MPC750 may still assume bus mastership on the bus clock cycle of the negation of BG because during the previous cycle BG indicated to the MPC750 that it could take mastership (if qualified).
7.2.1.3
Address Bus Busy (ABB)
The address bus busy (ABB) signal is both an input and an output signal. 7.2.1.3.1 Address Bus Busy (ABB)--Output Asserted--Indicates that the MPC750 is the address bus master. See Section 8.3.1, "Address Bus Arbitration." Negated--Indicates that the MPC750 is not using the address bus. If ABB is negated during the bus clock cycle following a qualified bus grant, the MPC750 did not accept mastership even if BR was asserted. This can occur if a potential transaction is aborted internally before the transaction begins. Timing Comments Assertion--Occurs on the bus clock cycle following a qualified BG that is accepted by the processor (see Negated). Negation--Occurs for a minimum of one-half bus clock cycle following the assertion of AACK. If ABB is negated during the bus clock cycle after a qualified bus grant, the MPC750 did not accept mastership, even if BR was asserted. High Impedance--Occurs after ABB is negated. 7.2.1.3.2 Address Bus Busy (ABB)--Input Asserted--Indicates that the address bus is in use. This condition effectively blocks the MPC750 from assuming address bus ownership, regardless of the BG input; see Section 8.3.1, "Address Bus Arbitration."
Following are the state meaning and timing comments for the ABB output signal. State Meaning
Following are the state meaning and timing comments for the ABB input signal. State Meaning
MOTOROLA
Chapter 7. Signal Descriptions
7-5
Signal Descriptions
Negated--Indicates that the address bus is not owned by another bus master and that it is available to the MPC750 when accompanied by a qualified bus grant. Timing Comments Assertion--May occur when the MPC750 must be kept from using the address bus (and the processor is not currently asserting ABB). Negation--May occur whenever the MPC750 can use the address bus.
7.2.2
Address Transfer Start Signals
Address transfer start signals are input and output signals that indicate that an address bus transfer has begun. The transfer start (TS) signal identifies the operation as a memory transaction. For detailed information about how TS interacts with other signals, refer to Section 8.3.2, "Address Transfer."
7.2.2.1
Transfer Start (TS)
The TS signal is both an input and an output signal on the MPC750. 7.2.2.1.1 Transfer Start (TS)--Output Asserted--Indicates that the MPC750 has begun a memory bus transaction and that the address bus and transfer attribute signals are valid. When asserted with the appropriate TT[0-4] signals it is also an implied data bus request for a memory transaction (unless it is an address-only operation). Negated--Indicates that no bus transaction is occurring during normal operation. Timing Comments Assertion--Coincides with the assertion of ABB. Negation--Occurs one bus clock cycle after TS is asserted. High Impedance--Coincides with the negation of ABB. 7.2.2.1.2 Transfer Start (TS)--Input Asserted--Indicates that another master has begun a bus transaction and that the address bus and transfer attribute signals are valid for snooping (see GBL). Negated--Indicates that no bus transaction is occurring.
Following are the state meaning and timing comments for the TS output signal. State Meaning
Following are the state meaning and timing comments for the TS input signal. State Meaning
7-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
Timing Comments Assertion--May occur during the assertion of ABB. Negation--Must occur one bus clock cycle after TS is asserted.
7.2.3
Address Transfer Signals
The address transfer signals are used to transmit the address and to generate and monitor parity for the address transfer. For a detailed description of how these signals interact, refer to Section 8.3.2, "Address Transfer."
7.2.3.1
Address Bus (A[0-31])
The address bus (A[0-31]) consists of 32 signals that are both input and output signals. 7.2.3.1.1 Address Bus (A[0-31])--Output Asserted/Negated--Represents the physical address (real address in the architecture specification) of the data to be transferred. On burst transfers, the address bus presents the double-word-aligned address containing the critical code/data that missed the cache on a read operation, or the first double word of the cache line on a write operation. Note that the address output during burst operations is not incremented. See Section 8.3.2, "Address Transfer."
Following are the state meaning and timing comments for the A[0-31] output signals. State Meaning
Timing Comments Assertion/Negation--Occurs on the bus clock cycle after a qualified bus grant (coincides with assertion of ABB and TS). High Impedance--Occurs one bus clock cycle after AACK is asserted. 7.2.3.1.2 Address Bus (A[0-31])--Input Following are the state meaning and timing comments for the A[0-31] input signals. State Meaning Asserted/Negated--Represents the physical address of a snoop operation.
Timing Comments Assertion/Negation--Must occur on the same bus clock cycle as the assertion of TS; is sampled by MPC750 only on this cycle.
7.2.3.2
Address Bus Parity (AP[0-3])
The address bus parity (AP[0-3]) signals are both input and output signals reflecting one bit of odd-byte parity for each of the 4 bytes of address when a valid address is on the bus.
MOTOROLA
Chapter 7. Signal Descriptions
7-7
Signal Descriptions
7.2.3.2.1
Address Bus Parity (AP[0-3])--Output
Following are the state meaning and timing comments for the AP[0-3] output signals on the MPC750. State Meaning Asserted/Negated--Represents odd parity for each of the 4 bytes of the physical address for a transaction. Odd parity means that an odd number of bits, including the parity bit, are driven high. The signal assignments correspond to the following: AP0 AP1 AP2 AP3 A[0-7] A[8-15] A[16-23] A[24-31]
For more information, see Section 8.3.2.1, "Address Bus Parity." Timing Comments Assertion/Negation--The same as A[0-31]. High Impedance--The same as A[0-31]. 7.2.3.2.2 Address Bus Parity (AP[0-3])--Input
Following are the state meaning and timing comments for the AP[0-3] input signal on the MPC750. State Meaning Asserted/Negated--Represents odd parity for each of the 4 bytes of the physical address for snooping operations. Detected even parity causes the processor to take a machine check exception or enter the checkstop state if address parity checking is enabled in the HID0 register; see Section 2.1.2.2, "Hardware Implementation-Dependent Register 0."
Timing Comments Assertion/Negation--The same as A[0-31].
7.2.4
Address Transfer Attribute Signals
The transfer attribute signals are a set of signals that further characterize the transfer--such as the size of the transfer, whether it is a read or write operation, and whether it is a burst or single-beat transfer. For a detailed description of how these signals interact, see Section 8.3.2, "Address Transfer." Note that some signal functions vary depending on whether the transaction is a memory access or an I/O access.
7.2.4.1
Transfer Type (TT[0-4])
The transfer type (TT[0-4]) signals consist of five input/output signals on the MPC750. For a complete description of TT[0-4] signals and for transfer type encodings, see Table 7-1.
7-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
7.2.4.1.1
Transfer Type (TT[0-4])--Output
Following are the state meaning and timing comments for the TT[0-4] output signals on the MPC750. State Meaning Asserted/Negated--Indicates the type of transfer in progress. Timing Comments Assertion/Negation/High Impedance--The same as A[0-31]. 7.2.4.1.2 Transfer Type (TT[0-4])--Input
Following are the state meaning and timing comments for the TT[0-4] input signals on the MPC750. State Meaning Asserted/Negated--Indicates the type of transfer in progress (see Table 7-2).
Timing Comments Assertion/Negation--The same as A[0-31]. Table 7-1 describes the transfer encodings for an MPC750 bus master.
Table 7-1. Transfer Type Encodings for MPC750 Bus Master
MPC750 Bus Master Transaction Address only1 Address only1 Address only1 Address only1 Address only1 Transaction Source dcbst dcbf sync dcbz or dcbi eieio TT0 TT1 TT2 TT3 TT4 60x Bus Specification Command Clean block Flush block sync Kill block eieio Transaction
0 0 0 0 1 1 1 1 0 0 0 0 1 0
0 0 1 1 0 0 1 1 0 0 1 1 X 0
0 1 0 1 0 1 0 1 0 1 0 1 X 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 1 1 1 1 0
Address only Address only Address only Address only Address only
Single-beat write ecowx (nonGBL) N/A N/A
External control word Single-beat write write TLB invalidate Address only
Single-beat read eciwx (nonGBL) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
External control word Single-beat read read lwarx reservation set Reserved tlbsync icbi Reserved Write-with-flush Address only -- Address only Address only -- Single-beat write or burst Burst
Single-beat write Caching-inhibited or write-through store Burst (nonGBL) Cast-out, or snoop copyback
0
0
1
1
0
Write-with-kill
MOTOROLA
Chapter 7. Signal Descriptions
7-9
Signal Descriptions
Table 7-1. Transfer Type Encodings for MPC750 Bus Master (continued)
MPC750 Bus Master Transaction Transaction Source TT0 TT1 TT2 TT3 TT4 60x Bus Specification Command Read Transaction
Single-beat read Caching-inhibited load or instruction fetch Burst Load miss, store miss, or instruction fetch
0
1
0
1
0
Single-beat read or burst Burst
0
1
1
1
0
Read-with-intent-tomodify Write-with-flush-ato mic Reserved Read-atomic
Single-beat write stwcx. N/A N/A
1 1 1
0 0 1
0 1 0
1 1 1
0 0 0
Single-beat write N/A Single-beat read or burst Burst -- --
Single-beat read lwarx (caching-inhibited load) Burst N/A N/A N/A N/A N/A lwarx (load miss) N/A N/A N/A N/A N/A
1 0 0 0 0 1
1 0 0 1 1 X
1 0 1 0 1 X
1 1 1 1 1 1
0 1 1 1 1 1
Read-with-intent-tomodify-atomic Reserved Reserved
Read-with-no-intent-t Single-beat read o-cache or burst Reserved Reserved -- --
Note: 1Address-only transaction occurs if enabled by setting HID0[ABE] bit to 1.
Table 7-2 describes the 60x bus specification transfer encodings and the MPC750 bus snoop response on an address hit.
Table 7-2. MPC750 Snoop Hit Response
60x Bus Specification Command Clean block Flush block sync Kill block eieio External control word write TLB Invalidate External control word read Transaction TT0 TT1 TT2 TT3 TT4 MPC750 Bus Snooper; Action on Hit N/A N/A N/A Flush, cancel reservation N/A N/A N/A N/A
Address only Address only Address only Address only Address only Single-beat write Address only Single-beat read
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
7-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
Table 7-2. MPC750 Snoop Hit Response (continued)
60x Bus Specification Command lwarx reservation set Reserved tlbsync icbi Reserved Write-with-flush Write-with-kill Read Read-with-intent-to-modify Write-with-flush-atomic Reserved Read-atomic Read-with-intent-to modify-atomic Reserved Reserved Read-with-no-intent-to-cache Reserved Reserved Transaction TT0 TT1 TT2 TT3 TT4 MPC750 Bus Snooper; Action on Hit N/A N/A N/A N/A N/A Flush, cancel reservation Kill, cancel reservation Clean or flush Flush Flush, cancel reservation N/A Clean or flush Flush N/A N/A Clean N/A N/A
Address only -- Address only Address only -- Single-beat write or burst Single-beat write or burst Single-beat read or burst Burst Single-beat write N/A Single-beat read or burst Burst -- -- Single-beat read or burst -- --
0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1
0 0 1 1 X 0 0 1 1 0 0 1 1 0 0 1 1 X
0 1 0 1 X 0 1 0 1 0 1 0 1 0 1 0 1 X
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
7.2.4.2
Transfer Size (TSIZ[0-2])--Output
Following are the state meaning and timing comments for the transfer size (TSIZ[0-2]) output signals on the MPC750. State Meaning Asserted/Negated--For memory accesses, these signals along with TBST, indicate the data transfer size for the current bus operation, as shown in Table 7-3. Table 8-3 shows how the transfer size signals are used with the address signals for aligned transfers. Table 8-4 shows how the transfer size signals are used with the address signals for misaligned transfers. Note that the MPC750 does not generate all possible TSIZ[0-2] encodings.
MOTOROLA
Chapter 7. Signal Descriptions
7-11
Signal Descriptions
For external control instructions (eciwx and ecowx), TSIZ[0-2] are used to output bits 29-31 of the external access register (EAR), which are used to form the resource ID (TBST||TSIZ0-TSIZ2). Timing Comments Assertion/Negation--The same as A[0-31]. High Impedance--The same as A[0-31].
Table 7-3. Data Transfer Size
TBST Asserted Negated Negated Negated Negated Negated Negated Negated Negated TSIZ[0-2] 010 000 001 010 011 100 101 110 111 Transfer Size Burst (32 bytes) 8 bytes 1 byte 2 bytes 3 bytes 4 bytes 5 bytes1 6 bytes1 7 bytes1
Note: 1Not generated by MPC750.
7.2.4.3
Transfer Burst (TBST)
The transfer burst (TBST) signal is an input/output signal on the MPC750. 7.2.4.3.1 Transfer Burst (TBST)--Output Asserted--Indicates that a burst transfer is in progress. Negated--Indicates that a burst transfer is not in progress. For external control instructions (eciwx and ecowx), TBST is used to output bit 28 of the EAR, which is used to form the resource ID (TBST||TSIZ0-TSIZ2). Timing Comments Assertion/Negation--The same as A[0-31]. High Impedance--The same as A[0-31]. 7.2.4.3.2 Transfer Burst (TBST)--Input Asserted/Negated--Used when snooping for single-beat reads (read with no intent to cache).
Following are the state meaning and timing comments for the TBST output signal. State Meaning
Following are the state meaning and timing comments for the TBST input signal. State Meaning
Timing Comments Assertion/Negation--The same as A[0-31].
7-12 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Signal Descriptions
7.2.4.4
Cache Inhibit (CI)--Output
The cache inhibit (CI) signal is an output signal on the MPC750. Following are the state meaning and timing comments for the CI signal. State Meaning Asserted--Indicates that a single-beat transfer will not be cached, reflecting the setting of the I bit for the block or page that contains the address of the current transaction. Negated--Indicates that a burst transfer will allocate an MPC750 data cache block. Timing Comments Assertion/Negation--The same as A[0-31]. High Impedance--The same as A[0-31].
7.2.4.5
Write-Through (WT)--Output
The write-through (WT) signal is an output signal on the MPC750. Following are the state meaning and timing comments for the WT signal. State Meaning Asserted--Indicates that a single-beat write transaction is write-through, reflecting the value of the W bit for the block or page that contains the address of the current transaction. Assertion during a read operation indicates instruction fetching. Negated--Indicates that a write transaction is not write-through; during a read operation negation indicates a data load. Timing Comments Assertion/Negation--The same as A[0-31]. High Impedance--The same as A[0-31].
7.2.4.6
Global (GBL)
The global (GBL) signal is an input/output signal on the MPC750. 7.2.4.6.1 Global (GBL)--Output Asserted--Indicates that a transaction is global, reflecting the setting of the M bit for the block or page that contains the address of the current transaction (except in the case of copy-back operations and instruction fetches, which are nonglobal.) Negated--Indicates that a transaction is not global. Timing Comments Assertion/Negation--The same as A[0-31]. High Impedance--The same as A[0-31].
Following are the state meaning and timing comments for the GBL output signal. State Meaning
MOTOROLA
Chapter 7. Signal Descriptions
7-13
Signal Descriptions
7.2.4.6.2
Global (GBL)--Input Asserted--Indicates that a transaction must be snooped by the MPC750. Negated--Indicates that a transaction is not snooped by the MPC750.
Following are the state meaning and timing comments for the GBL input signal. State Meaning
Timing Comments Assertion/Negation--The same as A[0-31].
7.2.5
Address Transfer Termination Signals
The address transfer termination signals are used to indicate either that the address phase of the transaction has completed successfully or must be repeated, and when it should be terminated. For detailed information about how these signals interact, see Section 8.3.3, "Address Transfer Termination."
7.2.5.1
Address Acknowledge (AACK)--Input
The address acknowledge (AACK) signal is an input-only signal on the MPC750. Following are the state meaning and timing comments for the AACK signal. State Meaning Asserted--Indicates that the address phase of a transaction is complete. The address bus will go to a high-impedance state on the next bus clock cycle. The MPC750 samples ARTRY on the bus clock cycle following the assertion of AACK. Negated--(During ABB) indicates that the address bus and the transfer attributes must remain driven. Timing Comments Assertion--May occur as early as the bus clock cycle after TS is asserted; assertion can be delayed to allow adequate address access time for slow devices. For example, if an implementation supports slow snooping devices, an external arbiter can postpone the assertion of AACK. Negation--Must occur one bus clock cycle after the assertion of AACK.
7.2.5.2
Address Retry (ARTRY)
The address retry (ARTRY) signal is both an input and output signal on the MPC750. 7.2.5.2.1 Address Retry (ARTRY)--Output
Following are the state meaning and timing comments for the ARTRY output signal.
7-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
State Meaning
Asserted--Indicates that the MPC750 detects a condition in which a snooped address tenure must be retried. If the MPC750 needs to update memory as a result of the snoop that caused the retry, the MPC750 asserts BR the second cycle after AACK if ARTRY is asserted. High Impedance--Indicates that the MPC750 does not need the snooped address tenure to be retried.
Timing Comments Assertion--Asserted the second bus cycle following the assertion of TS if a retry is required. Negation/HighZ--Driven until the bus_clk cycle following the assertion of AACK. Because this signal may be simultaneously driven by multiple devices, it negates in a unique fashion. First the buffer goes to high impedance for a minimum of one-half processor cycle (dependent on the clock mode), then it is driven negated for one-half bus cycle before returning to high impedance. This special method of negation may be disabled by setting precharge disable in HID0. 7.2.5.2.2 Address Retry (ARTRY)--Input Asserted--If the MPC750 is the address bus master, ARTRY indicates that the MPC750 must retry the preceding address tenure and immediately negate BR (if asserted). If the associated data tenure has already started, the MPC750 also aborts the data tenure immediately, even if the burst data has been received. If the MPC750 is not the address bus master, this input indicates that the MPC750 should immediately negate BR to allow an opportunity for a copy-back operation to main memory after a snooping bus master asserts ARTRY. Note that the subsequent address presented on the address bus may not be the same one associated with the assertion of the ARTRY signal. Negated/High Impedance--Indicates that the MPC750 does not need to retry the last address tenure. Timing Comments Assertion--May occur as early as the second cycle following the assertion of TS, and must occur by the bus clock cycle immediately following the assertion of AACK if an address retry is required. Negation--Must occur two bus clock cycles after the assertion of AACK.
Following are the state meaning and timing comments for the ARTRY input signal. State Meaning
MOTOROLA
Chapter 7. Signal Descriptions
7-15
Signal Descriptions
7.2.6
Data Bus Arbitration Signals
Like the address bus arbitration signals, data bus arbitration signals maintain an orderly process for determining data bus mastership. Note that there is no data bus arbitration signal equivalent to the address bus arbitration signal BR (bus request), because, except for address-only transactions, TS implies data bus requests. For a detailed description on how these signals interact, see Section 8.4.1, "Data Bus Arbitration." One special signal, DBWO, allows the MPC750 to be configured dynamically to write data out of order with respect to read data. For detailed information about using DBWO, see Section 8.10, "Using Data Bus Write Only."
7.2.6.1
Data Bus Grant (DBG)--Input
The data bus grant (DBG) signal is an input-only signal on the MPC750. Following are the state meaning and timing comments for the DBG signal. State Meaning Asserted--Indicates that the MPC750 may, with the proper qualification, assume mastership of the data bus. The MPC750 derives a qualified data bus grant when DBG is asserted and DBB, DRTRY, and ARTRY are negated; that is, the data bus is not busy (DBB is negated), there is no outstanding attempt to retry the current data tenure (DRTRY is negated), and there is no outstanding attempt to perform an ARTRY of the associated address tenure. Negated--Indicates that the MPC750 must hold off its data tenures. Timing Comments Assertion--May occur any time to indicate the MPC750 is free to take data bus mastership. It is not sampled until TS is asserted. Negation--May occur at any time to indicate the MPC750 cannot assume data bus mastership.
7.2.6.2
Data Bus Write Only (DBWO)--Input
The data bus write only (DBWO) signal is an input-only signal on the MPC750. Following are the state meaning and timing comments for the DBWO signal. State Meaning Asserted--Indicates that the MPC750 may run the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. Refer to Section 8.10, "Using Data Bus Write Only," for detailed instructions for using DBWO. Negated--Indicates that the MPC750 must run the data bus tenures in the same order as the address tenures. Timing Comments Assertion--Must occur no later than a qualified DBG for an outstanding write tenure. DBWO is sampled by the MPC750 on the clock of a qualified DBG. If no write requests are pending, the
7-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
MPC750 will ignore DBWO and assume data bus ownership for the next pending read request. Negation--May occur any time after a qualified DBG and before the next assertion of DBG.
7.2.6.3
Data Bus Busy (DBB)
The data bus busy (DBB) signal is both an input and output signal on the MPC750. 7.2.6.3.1 Data Bus Busy (DBB)--Output Asserted--Indicates that the MPC750 is the data bus master. The MPC750 always assumes data bus mastership if it needs the data bus and is given a qualified data bus grant (see DBG). Negated--Indicates that the MPC750 is not using the data bus. Timing Comments Assertion--Occurs during the bus clock cycle following a qualified DBG. Negation--Occurs for a minimum of one-half bus clock cycle (dependent on clock mode) following the assertion of the final TA. High Impedance--Occurs after DBB is negated. 7.2.6.3.2 Data Bus Busy (DBB)--Input Asserted--Indicates that another device is bus master. Negated--Indicates that the data bus is free (with proper qualification, see DBG) for use by the MPC750.
Following are the state meaning and timing comments for the DBB output signal. State Meaning
Following are the state meaning and timing comments for the DBB input signal. State Meaning
Timing Comments Assertion--Must occur when the MPC750 must be prevented from using the data bus. Negation--May occur whenever the data bus is available.
7.2.7
Data Transfer Signals
Like the address transfer signals, the data transfer signals are used to transmit data and to generate and monitor parity for the data transfer. For a detailed description of how the data transfer signals interact, see Section 8.4.3, "Data Transfer."
MOTOROLA
Chapter 7. Signal Descriptions
7-17
Signal Descriptions
7.2.7.1
Data Bus (DH[0-31], DL[0-31])
The data bus (DH[0-3]1 and DL[0-31]) consists of 64 signals that are both inputs and outputs on the MPC750. Following are the state meaning and timing comments for the DH and DL signals. State Meaning The data bus has two halves--data bus high (DH) and data bus low (DL). See Table 7-4 for the data bus lane assignments.
Timing Comments The data bus is driven once for noncached transactions and four times for cache transactions (bursts).
Table 7-4. Data Bus Lane Assignments
Data Bus Signals DH[0-7] DH[8-15] DH[16-23] DH[24-31] DL[0-7] DL[8-15] DL[16-23] DL[24-31] Byte Lane 0 1 2 3 4 5 6 7
7.2.7.1.1
Data Bus (DH[0-31], DL[0-31])--Output Asserted/Negated--Represents the state of data during a data write. Byte lanes not selected for data transfer will not supply valid data.
Following are the state meaning and timing comments for the DH and DL output signals. State Meaning
Timing Comments Assertion/Negation--Initial beat coincides with DBB and, for bursts, transitions on the bus clock cycle following each assertion of TA. High Impedance--Occurs on the bus clock cycle after the final assertion of TA, following the assertion of TEA, or in certain ARTRY cases. 7.2.7.1.2 Data Bus (DH[0-31], DL[0-31])--Input Following are the state meaning and timing comments for the DH and DL input signals. State Meaning Asserted/Negated--Represents the state of data during a data read transaction.
Timing Comments Assertion/Negation--Data must be valid on the same bus clock cycle that TA is asserted.
7-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
7.2.7.2
Data Bus Parity (DP[0-7])
The eight data bus parity (DP[0-7]) signals on the MPC750 are both output and input signals. 7.2.7.2.1 Data Bus Parity (DP[0-7])--Output Asserted/Negated--Represents odd parity for each of the 8 bytes of data write transactions. Odd parity means that an odd number of bits, including the parity bit, are driven high. The generation of parity is enabled through HID0. The signal assignments are listed in Table 7-5.
Following are the state meaning and timing comments for the DP output signals. State Meaning
Timing Comments Assertion/Negation--The same as DL[0-31]. High Impedance--The same as DL[0-31].
Table 7-5. DP[0-7] Signal Assignments
Signal Name DP0 DP1 DP2 DP3 DP4 DP5 DP6 DP7 Signal Assignments DH[0-7] DH[8-15] DH[16-23] DH[24-31] DL[0-7] DL[8-15] DL[16-23] DL[24-31]
7.2.7.2.2
Data Bus Parity (DP[0-7])--Input Asserted/Negated--Represents odd parity for each byte of read data. Parity is checked on all data byte lanes, regardless of the size of the transfer. Detected even parity causes a checkstop if data parity errors are enabled in the HID0 register.
Following are the state meaning and timing comments for the DP input signals. State Meaning
Timing Comments Assertion/Negation--The same as DL[0-31].
7.2.7.3
Data Bus Disable (DBDIS)--Input
Asserted--Indicates (for a write transaction) that the MPC750 must release the data bus and the data bus parity to high impedance during
Following are the state meaning and timing comments for the DBDIS signal. State Meaning
MOTOROLA
Chapter 7. Signal Descriptions
7-19
Signal Descriptions
the following cycle. The data tenure remains active, DBB remains driven, and the transfer termination signals are still monitored by the MPC750. Negated--Indicates the data bus should remain normally driven. DBDIS is ignored during read transactions. Timing Comments Assertion/Negation--May be asserted on any clock cycle when the MPC750 is driving or will be driving the data bus; may remain asserted multiple cycles.
7.2.8
Data Transfer Termination Signals
Data termination signals are required after each data beat in a data transfer. Note that in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat. For a detailed description of how these signals interact, see Section 8.4.4, "Data Transfer Termination."
7.2.8.1
Transfer Acknowledge (TA)--Input
Asserted-- Indicates that a single-beat data transfer completed successfully or that a data beat in a burst transfer completed successfully (unless DRTRY is asserted on the next bus clock cycle). Note that TA must be asserted for each data beat in a burst transaction and must be asserted during assertion of DRTRY. For more information, see Section 8.4.4, "Data Transfer Termination." Negated--(During DBB) indicates that, until TA is asserted, the MPC750 must continue to drive the data for the current write or must wait to sample the data for reads.
Following are the state meaning and timing comments for the TA signal. State Meaning
Timing Comments Assertion--Must not occur before AACK for the current transaction (if the address retry mechanism is to be used to prevent invalid data from being used by the processor); otherwise, assertion may occur at any time during the assertion of DBB. The system can withhold assertion of TA to indicate that the MPC750 should insert wait states to extend the duration of the data beat. Negation--Must occur after the bus clock cycle of the final (or only) data beat of the transfer. For a burst transfer, the system can assert TA for one bus clock cycle and then negate it to advance the burst transfer to the next beat and insert wait states during the next beat.
7-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
7.2.8.2
Data Retry (DRTRY)--Input
Asserted--Indicates that the MPC750 must invalidate the data from the previous read operation. Negated--Indicates that data presented with TA on the previous read operation is valid. Note that DRTRY is ignored for write transactions.
Following are the state meaning and timing comments for the DRTRY signal. State Meaning
Timing Comments Assertion--Must occur during the bus clock cycle immediately after TA is asserted if a retry is required. The DRTRY signal may be held asserted for multiple bus clock cycles. When DRTRY is negated, data must have been valid on the previous clock with TA asserted. Negation--Must occur during the bus clock cycle after a valid data beat. This may occur several cycles after DBB is negated, effectively extending the data bus tenure. Start-up--The DRTRY signal is sampled at the negation of HRESET; if DRTRY is asserted, no-DRTRY mode is selected. If DRTRY is negated at start-up, DRTRY is enabled.
7.2.8.3
Transfer Error Acknowledge (TEA)--Input
Asserted--Indicates that a bus error occurred. Causes a machine check exception (and possibly causes the processor to enter checkstop state if machine check enable bit is cleared (MSR[ME] = 0)). For more information, see Section 4.5.2.2, "Checkstop State (MSR[ME] = 0)." Assertion terminates the current transaction; that is, assertion of TA and DRTRY are ignored. The assertion of TEA causes the negation/high impedance of DBB in the next clock cycle. However, data entering the GPR or the cache are not invalidated. (Note that the term `exception' is also referred to as `interrupt' in the architecture specification.) Negated--Indicates that no bus error was detected.
Following are the state meaning and timing comments for the TEA signal. State Meaning
Timing Comments Assertion--May be asserted while DBB is asserted, and the cycle after TA during a read operation. TEA should be asserted for one cycle only. Negation--TEA must be negated no later than the negation of DBB.
MOTOROLA
Chapter 7. Signal Descriptions
7-21
Signal Descriptions
7.2.9
System Status Signals
Most system status signals are input signals that indicate when exceptions are received, when checkstop conditions have occurred, and when the MPC750 must be reset. The MPC750 generates the output signal, CKSTP_OUT, when it detects a checkstop condition. For a detailed description of these signals, see Section 8.7, "Interrupt, Checkstop, and Reset Signal Operation."
7.2.9.1
Interrupt (INT)--Input
Asserted--The MPC750 initiates an interrupt if MSR[EE] is set; otherwise, the MPC750 ignores the interrupt. To guarantee that the MPC750 will take the external interrupt, INT must be held active until the MPC750 takes the interrupt; otherwise, whether the MPC750 takes an external interrupt depends on whether the MSR[EE] bit was set while the INT signal was held active. Negated--Indicates that normal operation should proceed. See Section 8.7.1, "External Interrupts."
Following are the state meaning and timing comments for the INT signal. State Meaning
Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the input clocks. The INT input is level-sensitive. Negation--Should not occur until interrupt is taken.
7.2.9.2
System Management Interrupt (SMI)--Input
Asserted--The MPC750 initiates a system management interrupt operation if the MSR[EE] is set; otherwise, the MPC750 ignores the exception condition. The system must hold SMI active until the exception is taken. Negated--Indicates that normal operation should proceed. See Section 8.7.1, "External Interrupts."
Following are the state meaning and timing comments for SMI. State Meaning
Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the input clocks. The SMI input is level-sensitive. . Negation--Should not occur until interrupt is taken.
7.2.9.3
Machine Check Interrupt (MCP)--Input
Asserted--The MPC750 initiates a machine check interrupt operation if MSR[ME] and HID0[EMCP] are set; if MSR[ME] is cleared and HID0[EMCP] is set, the MPC750 must terminate
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Following are the state meaning and timing comments for the MCP signal. State Meaning
7-22
Signal Descriptions
operation by internally gating off all clocks, and releasing all outputs (except CKSTP_OUT) to the high-impedance state. If HID0[EMCP] is cleared, the MPC750 ignores the interrupt condition. The MCP signal must be held asserted for two bus clock cycles. Negated--Indicates that normal operation should proceed. See Section 8.7.1, "External Interrupts." Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the input clocks. The MCP input is negative edge-sensitive. Negation--May be negated two bus cycles after assertion.
7.2.9.4
Checkstop Input (CKSTP_IN)--Input
Asserted--Indicates that the MPC750 must terminate operation by internally gating off all clocks, and release all outputs (except CKSTP_OUT) to the high-impedance state. Once CKSTP_IN has been asserted it must remain asserted until the system has been reset. Negated--Indicates that normal operation should proceed. See Section 8.7.2, "Checkstops."
Following are the state meaning and timing comments for the CKSTP_IN signal. State Meaning
Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the input clocks. Negation--May occur any time after the CKSTP_OUT output signal has been asserted.
7.2.9.5
Checkstop Output (CKSTP_OUT)--Output
Note that the CKSTP_OUT signal is an open-drain type output, and requires an external pull-up resistor (for example, 10 k to Vdd) to assure proper de-assertion of the CKSTP_OUT signal. Following are the state meaning and timing comments for the CKSTP_OUT signal. State Meaning Asserted--Indicates that the MPC750 has detected a checkstop condition and has ceased operation. Negated--Indicates that the MPC750 is operating normally. See Section 8.7.2, "Checkstops." Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the MPC750 input clocks. Negation--Is negated upon assertion of HRESET.
MOTOROLA
Chapter 7. Signal Descriptions
7-23
Signal Descriptions
7.2.9.6
Reset Signals
There are two reset signals on the MPC750--hard reset (HRESET) and soft reset (SRESET). Descriptions of the reset signals are as follows: 7.2.9.6.1 Hard Reset (HRESET)--Input
The hard reset (HRESET) signal must be used at power-on in conjunction with the TRST signal to properly reset the processor. Following are the state meaning and timing comments for the HRESET signal. State Meaning Asserted--Initiates a complete hard reset operation when this input transitions from asserted to negated. Causes a reset exception as described in Section 4.5.1, "System Reset Exception (0x00100)." Output drivers are released to high impedance within five clocks after the assertion of HRESET. Negated--Indicates that normal operation should proceed. See Section 8.7.3, "Reset Inputs." Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the MPC750 input clock; must be held asserted for a minimum of 255 clock cycles after the PLL lock time has been met. Refer to the MPC750 hardware specifications for further timing comments. Negation--May occur any time after the minimum reset pulse width has been met. This input has additional functionality in certain test modes. 7.2.9.6.2 Soft Reset (SRESET)--Input Following are the state meaning and timing comments for the SRESET signal. State Meaning Asserted-- Does not initialize internal resources (different from HRESET assertion). However, initiates processing for a reset exception as described in Section 4.5.1, "System Reset Exception (0x00100)," (same as HRESET). Negated--Indicates that normal operation should proceed. See Section 8.7.3, "Reset Inputs." Timing Comments Assertion--May occur at any time and may be asserted asynchronously to the MPC750 input clock. The SRESET input is negative-edge sensitive. Negation--May be negated two bus cycles after assertion. This input has additional functionality in certain test modes.
7-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
7.2.9.7
Processor Status Signals
Processor status signals indicate the state of the processor. This includes the memory reservation signal, machine quiesce control signals, time base enable signal, and TLBISYNC signal. 7.2.9.7.1 Quiescent Request (QREQ)--Output Asserted--Indicates that the MPC750 is requesting all bus activity normally required to be snooped to terminate or to pause so the MPC750 may enter a quiescent (low power) state. When the MPC750 has entered a quiescent state, it no longer snoops bus activity. Negated--Indicates that the MPC750 is not making a request to enter the quiescent state. Timing Comments Assertion/Negation--May occur on any cycle. QREQ will remain asserted for the duration of the quiescent state. 7.2.9.7.2 Quiescent Acknowledge (QACK)--Input Asserted--Indicates that all bus activity that requires snooping has terminated or paused, and that the MPC750 may enter the quiescent (or low power) state. Negated--Indicates that the MPC750 may not enter a quiescent state, and must continue snooping the bus. Timing Comments Assertion/Negation--May occur on any cycle following the assertion of QREQ, and must be held asserted for at least one bus clock cycle. 7.2.9.7.3 Reservation (RSRV)--Output Asserted/Negated--Represents the state of the reservation coherency bit in the reservation address register that is used by the lwarx and stwcx. instructions. See Section 8.8.1, "Support for the lwarx/stwcx. Instruction Pair."
Following are the state meaning and timing comments for QREQ. State Meaning
Following are the state meaning and timing comments for the QACK signal. State Meaning
Following are the state meaning and timing comments for RSRV. State Meaning
Timing Comments Assertion/Negation--Occurs synchronously with respect to bus clock cycles. The execution of an lwarx instruction sets the internal reservation condition.
MOTOROLA
Chapter 7. Signal Descriptions
7-25
Signal Descriptions
7.2.9.7.4
Time Base Enable (TBEN)--Input Asserted--Indicates that the time base should continue clocking. This input is essentially a count enable control for the time base counter. Negated--Indicates the time base should stop clocking.
Following are the state meaning and timing comments for the TBEN signal. State Meaning
Timing Comments Assertion/Negation--May occur on any cycle. 7.2.9.7.5 TLBI Sync (TLBISYNC)--Input
The TLBI Sync (TLBISYNC) signal is an input-only signal on the MPC750. Following are the state meaning and timing comments for the TLBISYNC signal. State Meaning Asserted--Indicates that instruction execution should stop after execution of a tlbsync instruction. Negated--Indicates that the instruction execution may continue or resume after the completion of a tlbsync instruction. Timing Comments Assertion/Negation--May occur on any cycle. The TLBISYNC signal must be held negated during HRESET. 7.2.9.7.6 L2 Cache Interface
The MPC750's dedicated L2 cache interface provides all the signals required for the support of up to 1 Mbyte of synchronous SRAM for data storage. The use of the L2 data parity (L2DP[0-7]) and L2 low-power mode enable (L2ZZ) signals is optional, and depends on the SRAMs selected for use with the MPC750. Note that the least-significant bit of L2 address (L2ADDR[16-0]) signals is identified as bit 0, and the most-significant bit is identified as bit 16. Note that the L2 cache interface is not implemented in the MPC740.
7.2.9.8
L2 Address (L2ADDR[16-0])--Output
Asserted/Negated--Represents the address of the data to be transferred to the L2 cache. The L2 address bus is configured with bit 0 as the least-significant bit. Address bit 14 determines which cache tag set is selected.
Following are the state meaning and timing comments for the L2 address output signals. State Meaning
Timing Comments Assertion/Negation--Driven valid by the MPC750 during read and write operations; driven with static data when the L2 cache memory is not being accessed.
7-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Signal Descriptions
7.2.9.9
L2 Data (L2DATA[0-63])
The data bus (L2DATA[0-63]) consists of 64 signals that are both input and output on the MPC750. 7.2.9.9.1 L2 Data (L2DATA[0-63])--Output Asserted/Negated--Represents the state of data during a data write transaction; data is always transferred as double words.
Following are the state meaning and timing comments for the L2 data output signals. State Meaning
Timing Comments Assertion/Negation--Driven valid by MPC750 during write operations; driven with static data when the L2 cache memory is not being accessed by a read operation. High Impedance--Occurs for at least one cycle when changing between read and write operations to the L2 cache memory. 7.2.9.9.2 L2 Data (L2DATA[0-63])--Input Asserted/Negated--Represents the state of data during a data read transaction; data is always transferred as double words.
Following are the state meaning and timing comments for the L2 data input signals. State Meaning
Timing Comments Assertion/Negation--Driven valid by L2 cache memory during read operations.
7.2.9.10 L2 Data Parity (L2DP[0-7])
The eight data bus parity (L2DP[0-7]) signals on the MPC750 are both output and input signals. 7.2.9.10.1 L2 Data Parity (L2DP[0-7])--Output Following are the state meaning and timing comments for the L2 data parity output signals. State Meaning Asserted/Negated--Represents odd parity for each of the 8 bytes of L2 cache data during write transactions. Odd parity means that an odd number of bits, including the parity bit, are driven high. Note that parity bit 0 is associated with bits 0-7 (byte lane 0) of the L2DATA bus.
Timing Comments Assertion/Negation--The same as L2DATA[0-63]. High Impedance--The same as L2DATA[0-63]. 7.2.9.10.2 L2 Data Parity (L2DP[0-7])--Input Following are the state meaning and timing comments for the L2 parity input signals.
MOTOROLA Chapter 7. Signal Descriptions 7-27
Signal Descriptions
State Meaning
Asserted/Negated--Represents odd parity for each byte of L2 cache read data.
Timing Comments Assertion/Negation--The same as L2DATA[0-63].
7.2.9.11 L2 Chip Enable (L2CE)--Output
Following are the state meaning and timing comments for the L2CE signal. State Meaning Asserted--Indicates that the L2 cache memory devices are being selected for a read or write operation. Negated--Indicates that the MPC750 is not selecting the L2 cache memory devices for a read or write operation. Timing Comments Assertion/Negation--May occur on any cycle. L2CE is driven high during HRESET assertion.
7.2.9.12 L2 Write Enable (L2WE)--Output
Following are the state meaning and timing comments for the L2WE signal. State Meaning Asserted--Indicates that the MPC750 is performing a write operation to the L2 cache memory. Negated--Indicates that the MPC750 is not performing an L2 cache memory write operation. Timing Comments Assertion/Negation--May occur on any cycle. L2WE is driven high during HRESET assertion.
7.2.9.13 L2 Clock Out A (L2CLK_OUTA)--Output
Following are the state meaning and timing comments for the L2CLK_OUTA signal. State Meaning Asserted/Negated--Clock output for L2 cache memory devices. The L2CLK_OUTA signal is identical and synchronous with the L2CLK_OUTB signal, and provides the capability to drive up to four L2 cache memory devices. If differential L2 clocking is configured through the setting of the L2CR, the L2CLK_OUTB signal is driven phase inverted with relation to the L2CLK_OUTA signal.
Timing Comments Assertion/Negation--Refer to the MPC750 hardware specifications for timing comments. The L2CLK_OUTA signal is driven low during assertion of HRESET.
7.2.9.14 L2 Clock Out B (L2CLK_OUTB)--Output
Following are the state meaning and timing comments for the L2CLK_OUTB signal. State Meaning Asserted/Negated--Clock output for L2 cache memory devices. The L2CLK_OUTB signal is identical and synchronous with the
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
7-28
Signal Descriptions
L2CLK_OUTA signal, and provides the capability to drive up to four L2 cache memory devices. If differential L2 clocking is configured through the setting of the L2CR, the L2CLK_OUTA signal is driven phase inverted with relation to the L2CLK_OUTB signal. Timing Comments Assertion/Negation--Refer to the MPC750 hardware specifications for timing comments. The L2CLK_OUTB signal is driven low during assertion of HRESET.
7.2.9.15 L2 Sync Out (L2SYNC_OUT)--Output
Following are the state meaning and timing comments for the L2SYNC_OUT signal. State Meaning Asserted/Negated--Clock output for L2 clock synchronization. The L2SYNC_OUT signal should be routed half of the trace length to the L2 cache memory devices and returned to the L2SYNC_IN signal input.
Timing Comments Assertion/Negation--Refer to the MPC750 hardware specifications for timing comments. The L2SYNC_OUT signal is driven low during assertion of HRESET.
7.2.9.16 L2 Sync In (L2SYNC_IN)--Input
Following are the state meaning and timing comments for the L2SYNC_IN signal. State Meaning Asserted/Negated--Clock input for L2 clock synchronization. The L2SYNC_IN signal is driven by the L2SYNC_OUT signal output.
Timing Comments Assertion/Negation--Refer to the MPC750 hardware specifications for timing comments. The routing of this signal on the printed circuit board should ensure that the rising edge at L2SYNC_IN is coincident with the rising edge of the clock at the clock input of the L2 cache memory devices.
7.2.9.17 L2 Low-Power Mode Enable (L2ZZ)--Output
Following are the state meaning and timing comments for the L2ZZ signal. State Meaning Asserted/Negated--Enables low-power mode for certain L2 cache memory devices. Operation of the signal is enabled through the L2CR.
Timing Comments Assertion/Negation--Occurs synchronously with the L2 clock when the MPC750 enters and exits the nap or sleep power modes; after negation of this signal, at least two L2 clock cycles will elapse before L2 cache operations resume. The L2ZZ signal is driven low during assertion of HRESET.
MOTOROLA
Chapter 7. Signal Descriptions
7-29
Signal Descriptions
7.2.10 IEEE 1149.1a-1993 Interface Description
The MPC750 has five dedicated JTAG signals which are described in Table 7-6. The test data input (TDI) and test data output (TDO) scan ports are used to scan instructions as well as data into the various scan registers for JTAG operations. The scan operation is controlled by the test access port (TAP) controller which in turn is controlled by the test mode select (TMS) input sequence. The scan data is latched in at the rising edge of test clock (TCK).
Table 7-6. IEEE Interface Pin Descriptions
Signal Name TDI TDO TMS TCK TRST Input/Output Input Output Input Input Input Weak Pullup Provided Yes No Yes Yes Yes IEEE 1149.1a Function Serial scan input signal Serial scan output signal TAP controller mode signal Scan clock TAP controller reset
Test reset (TRST) is a JTAG optional signal which is used to reset the TAP controller asynchronously. The TRST signal assures that the JTAG logic does not interfere with the normal operation of the chip, and must be asserted and deasserted coincident with the assertion of the HRESET signal.
7.2.11 Clock Signals
The MPC750 clock signal inputs determine the system clock frequency and provide a flexible clocking scheme that allows the processor to operate at an integer multiple of the system clock frequency. Refer to the MPC750 hardware specifications for exact timing relationships of the clock signals.
7.2.11.1 System Clock (SYSCLK)--Input
The MPC750 requires a single system clock (SYSCLK) input. This input sets the frequency of operation for the bus interface. Internally, the MPC750 uses a phase-locked loop (PLL) circuit to generate a master clock for all of the CPU circuitry (including the bus interface circuitry) which is phase-locked to the SYSCLK input. The master clock may be set to an integer or half-integer multiple (2:1, 2.5:1, 3:1, 3.5:1, 4:1, 4.5:1, 5:1, 5.5:1, 6:1, 6.5:1, or 7:1) of the SYSCLK frequency allowing the CPU core to operate at an equal or greater frequency than the bus interface. State Meaning Asserted/Negated--The SYSCLK input is the primary clock input for the MPC750, and represents the bus clock frequency for MPC750 bus operation. Internally, the MPC750 may be operating at an integer or half-integer multiple of the bus clock frequency.
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
7-30
Signal Descriptions
Timing Comments Duty cycle--Refer to the MPC750 hardware specifications for timing comments. Note: SYSCLK is used as the frequency reference for the internal PLL clock generator, and must not be suspended or varied during normal operation to ensure proper PLL operation.
7.2.11.2 Clock Out (CLK_OUT)--Output
The clock out (CLK_OUT) signal is an output signal (output-only) on the MPC750. Following are the state meaning and timing comments for the CLK_OUT signal. State Meaning Asserted/Negated--Provides PLL clock output for PLL testing and monitoring. The configuration of the HID0[SBCLK] and HID0[ECLK] bits determines whether the CLK_OUT signal clocks at either the processor clock frequency, the bus clock frequency, or half of the bus clock frequency. See Table 2-5 for HID0 register configuration of the CLK_OUT signal. The CLK_OUT signal defaults to a high-impedance state following the assertion of HRESET. The CLK_OUT signal is provided for testing only.
Timing Comments Assertion/Negation--Refer to the MPC750 hardware specifications for timing comments.
7.2.11.3 PLL Configuration (PLL_CFG[0-3])--Input
The PLL (phase-locked loop) is configured by the PLL_CFG[0-3] signals. For a given SYSCLK (bus) frequency, the PLL configuration signals set the internal CPU frequency of operation. Refer to the MPC750 hardware specifications for PLL configuration. Following are the state meaning and timing comments for the PLL_CFG[0-3] signals. State Meaning Asserted/Negated-- Configures the operation of the PLL and the internal processor clock frequency. Settings are based on the desired bus and internal frequency of operation.
Timing Comments Assertion/Negation--Must remain stable during operation; should only be changed during the assertion of HRESET or during sleep mode. These bits may be read through the PC[0-3] bits in the HID1 register.
7.2.12 Power and Ground Signals
The MPC750 provides the following connections for power and ground: * * VDD--The VDD signals provide the supply voltage connection for the processor core. OVDD--The OVDD signals provide the supply voltage connection for the system interface drivers.
Chapter 7. Signal Descriptions 7-31
MOTOROLA
Signal Descriptions
*
*
*
*
*
L2VDD--The L2VDD signals provide the supply voltage connection for the L2 cache interface drivers. These power supply signals are isolated from the VDD and OVDD power supply signals. These signals are not implemented on the MPC740. AVDD--The AVDD power signal provides power to the clock generation phase-locked loop. See the MPC750 hardware specifications for information on how to use this signal. L2AVDD--The L2AVDD power signal provides power to the L2 delay-locked loop. See the MPC750 hardware specifications for information on how to use this signal. This signal is not implemented on the MPC740. GND and OGND--The GND and OGND signals provide the connection for grounding the MPC750. On the MPC750, there is no electrical distinction between the GND and OGND signals. L2GND--The L2GND signals provide the ground connection for the L2 cache interface. These ground signals are isolated from the GND and OGND ground signals. These signals are not implemented on the MPC740.
7-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 8 System Interface Operation
This chapter describes the MPC750 microprocessor bus interface and its operation. It shows how the MPC750 signals, defined in Chapter 7, "Signal Descriptions," interact to perform address and data transfers. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor."
8.1
MPC750 System Interface Overview
The system interface prioritizes requests for bus operations from the instruction and data caches, and performs bus operations in accordance with the 60x bus protocol. It includes address register queues, prioritization logic, and a bus control unit. The system interface latches snoop addresses for snooping in the data cache and in the address register queues, and for reservations controlled by the Load Word and Reserve Indexed (lwarx) and Store Word Conditional Indexed (stwcx.) instructions, and maintains the touch load address for the cache. The interface allows one level of pipelining; that is, with certain restrictions discussed later, there can be two outstanding transactions at any given time. Accesses are prioritized with load operations preceding store operations. Instructions are automatically fetched from the memory system into the instruction unit where they are dispatched to the execution units at a peak rate of two instructions per clock. Conversely, load and store instructions explicitly specify the movement of operands to and from the integer and floating-point register files and the memory system. When the MPC750 encounters an instruction or data access, it calculates the logical address (effective address in the architecture specification) and uses the low-order address bits to check for a hit in the on-chip, 32-Kbyte instruction and data caches. During cache lookup, the instruction and data memory management units (MMUs) use the higher-order address bits to calculate the virtual address, from which they calculate the physical address (real address in the architecture specification). The physical address bits are then compared with the corresponding cache tag bits to determine if a cache hit occurred in the L1 instruction or data cache. If the access misses in the corresponding cache, the physical address is used to access the L2 cache tags (if the L2 cache is enabled). If no match is found in the L2 cache tags, the physical address is used to access system memory. In addition to the loads, stores, and instruction fetches, the MPC750 performs hardware table search operations following TLB misses, L2 cache cast-out operations when
MOTOROLA Chapter 8. System Interface Operation 8-1
MPC750 System Interface Overview
least-recently used cache lines are written to memory after a cache miss, and cache-line snoop push-out operations when a modified cache line experiences a snoop hit from another bus master. Figure 8-1 shows the address path from the execution units and instruction fetcher, through the translation logic to the caches and system interface logic. The MPC750 uses separate address and data buses and a variety of control and status signals for performing reads and writes. The address bus is 32 bits wide and the data bus is 64 bits wide. The interface is synchronous--all MPC750 inputs are sampled at and all outputs are driven from the rising edge of the bus clock. The processor runs at a multiple of the bus-clock speed. The MPC750 core operates at 2.5 volts, and the I/O signals operate at 3.3 volts.
8.1.1
Operation of the Instruction and Data L1 Caches
The MPC750 provides independent instruction and data L1 caches. Each cache is a physically-addressed, 32-Kbyte cache with eight-way set associativity. Both caches consist of 128 sets of eight cache lines, with eight words in each cache line. Because the data cache on the MPC750 is an on-chip, write-back primary cache, the predominant type of transaction for most applications is burst-read memory operations, followed by burst-write memory operations and single-beat (noncacheable or write-through) memory read and write operations. Additionally, there can be address-only operations, variants of the burst and single-beat operations (global memory operations that are snooped, and atomic memory operations, for example), and address retry activity (for example, when a snooped read access hits a modified line in the cache). Since the MPC750 data cache tags are single ported, simultaneous load or store and snoop accesses cause resource contention. Snoop accesses have the highest priority and are given first access to the tags, unless the snoop access coincides with a tag write, in which case the snoop is retried and must re-arbitrate for access to the cache. Loads or stores that are deferred due to snoop accesses are performed on the clock cycle following the snoop. The MPC750 supports a three-state coherency protocol that supports the modified, exclusive, and invalid (MEI) cache states. The protocol is a subset of the MESI (modified/exclusive/shared/invalid) four-state protocol and operates coherently in systems that contain four-state caches. With the exception of the dcbz instruction (and the dcbi, dcbst, and dcbf instructions, if HID0[ABE] is enabled), the MPC750 does not broadcast cache control instructions. The cache control instructions are intended for the management of the local cache but not for other caches in the system. Cache lines in the MPC750 are loaded in four beats of 64 bits each. The burst load is performed as critical double word first. The critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. If subsequent loads follow in sequential order, the instructions or data will be forwarded to the requesting unit as the cache block is written.
8-2 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Instruction Unit Fetcher BTIC 64 Entry SRs (Shadow) IBAT Array ITLB BHT CTR LR Instruction MMU Branch Processing Unit
MOTOROLA
128-Bit (4 Instructions) Instruction Queue (6 Word) Tags 32-Kbyte I Cache 2 Instructions Dispatch Unit 64-Bit (2 Instructions) Reservation Station GPR File Rename Buffers (6) Integer Unit 2 + CR 32-Bit 32-Bit System Register Unit 32-Bit 64-Bit Reservation Station Reservation Station (2 Entry) FPR File Rename Buffers (6) 64-Bit Floating-Point Unit +x/ FPSCR FPSCR Load/Store Unit + (EA Calculation) Store Queue Reservation Station PA Data MMU SRs (Original) DBAT Array DTLB EA 64-Bit 60x Bus Interface Unit Instruction Fetch Queue L1 Castout Queue Tags 32-Kbyte D Cache Data Load Queue 64-Bit L2 Bus Interface Unit L2 Castout Queue L2 Controller L2CR 32-Bit Address Bus 64-Bit Data Bus 17-Bit L2 Address Bus 64-Bit L2 Data Bus L2 Tags Not in the MPC740
Additional Features * Time Base Counter/Decrementer * Clock Multiplier * JTAG/COP Interface * Thermal/Power Manage-
Reservation Station
Integer Unit 1
Figure 8-1. MPC750 Microprocessor Block Diagram
Chapter 8. System Interface Operation
+x/
Completion Unit
Reorder Buffer (6 Entry)
MPC750 System Interface Overview
8-3
MPC750 System Interface Overview
Cache lines are selected for replacement based on a pseudo least-recently-used (PLRU) algorithm. Each time a cache line is accessed, it is tagged as the most-recently-used line of the set. When a miss occurs, and all eight lines in the set are marked as valid, the least recently used line is replaced with the new data. When data to be replaced is in the modified state, the modified data is written into a write-back buffer while the missed data is being read from memory. When the load completes, the MPC750 then pushes the replaced line from the write-back buffer to the L2 cache (if enabled), or to main memory in a burst write operation.
8.1.2
Operation of the L2 Cache
The MPC750 provides an on-chip, two-way set associative tag memory, and a dedicated L2 cache port with support for up to 1 Mbyte of external synchronous SRAMs for data storage. The L2 cache normally operates in copy-back mode and supports system cache coherency through snooping. Designers should note that the MPC740 does not implement the on-chip L2 tag memory, or the signals required for the support of the external SRAMs, and memory accesses go directly to the bus interface unit. The L2 cache receives independent memory access requests from both the L1 instruction and data caches. The L1 accesses are compared to the L2 cache tags and the data or instructions are forwarded from the L2 to the L1 cache if there is a cache hit, or are forwarded on to the bus interface unit if there is an L2 cache miss, or if the address being accessed is from a page marked as caching-inhibited. Burst read accesses that miss in the L2 cache initiate a load operation from the bus interface. As the load operation transfers data to the L1 cache, the data is also loaded into the L2 cache, and marked as valid unmodified in the L2 cache tags. An L1 load, store, or castout operation can cause an L2 cache block allocation resulting in the castout of an L2 cache block marked modified to the bus interface. For additional information about the operation of the L2 cache, refer to Chapter 9, "L2 Cache Interface Operation."
8.1.3
Operation of the System Interface
Memory accesses can occur in single-beat (1, 2, 3, 4, and 8 bytes) and four-beat (32 bytes) burst data transfers. The address and data buses are independent for memory accesses to support pipelining and split transactions. The MPC750 can pipeline as many as two transactions and has limited support for out-of-order split-bus transactions. Access to the system interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership. This arbitration mechanism is flexible, allowing the MPC750 to be integrated into systems that implement various fairness and bus-parking procedures to avoid arbitration overhead. Typically, memory accesses are weakly ordered to maximize the efficiency of the bus without sacrificing coherency of the data. The MPC750 allows load operations to bypass store operations (except when a dependency exists). In addition, the MPC750 can be
8-4 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC750 System Interface Overview
configured to reorder high-priority store operations ahead of lower-priority store operations. Because the processor can dynamically optimize run-time ordering of load/store traffic, overall performance is improved. Note that the synchronize (sync) and enforce in-order execution of IO (eieio) instructions can be used to enforce strong ordering. The following sections describe how the MPC750 interface operates, providing detailed timing diagrams that illustrate how the signals interact. A collection of more general timing diagrams are included as examples of typical bus operations. Figure 8-2 is a legend of the conventions used in the timing diagrams.
Bar over signal name indicates active low ap0 BR ADDR+ qual BG MPC750 input (while MPC750 is a bus master) MPC750 output (while MPC750 is a bus master) MPC750 output (grouped: here, address plus attributes)
MPC750 internal signal (inaccessible to the user, but used in diagrams to clarify operations)
Compelling dependency--event will occur on the next clock cycle Prerequisite dependency--event will occur on an undetermined subsequent clock cycle
MPC750 three-state output or input MPC750 nonsampled input
Signal with sample point
A sampled condition (dot on high or low state) with multiple dependencies Timing for a signal had it been asserted (it is not actually asserted)
Figure 8-2. Timing Diagram Legend
MOTOROLA
Chapter 8. System Interface Operation
8-5
Memory Access Protocol
This is a synchronous interface--all MPC750 input signals are sampled and output signals are driven on the rising edge of the bus clock cycle (see the MPC750 hardware specifications for exact timing information).
8.1.4
Direct-Store Accesses
The MPC750 does not support the extended transfer protocol for accesses to the direct-store storage space. The transfer protocol used for any given access is selected by the T bit in the MMU segment registers; if the T bit is set, the memory access is a direct-store access. An attempt to access instructions or data in a direct-store segment will result in the MPC750 taking an ISI or DSI exception.
8.2
Memory Access Protocol
Memory accesses are divided into address and data tenures. Each tenure has three phases--bus arbitration, transfer, and termination. The MPC750 also supports address-only transactions. Note that address and data tenures can overlap, as shown in Figure 8-3. Figure 8-3 shows that the address and data tenures are distinct from one another and that both consist of three phases--arbitration, transfer, and termination. Address and data tenures are independent (indicated in Figure 8-3 by the fact that the data tenure begins before the address tenure ends), which allows split-bus transactions to be implemented at the system level in multiprocessor systems. Figure 8-3 shows a data transfer that consists of a single-beat transfer of as many as 64 bits. Four-beat burst transfers of 32-byte cache lines require data transfer termination signals for each beat of data.
ADDRESS TENURE
ARBITRATION
TRANSFER
TERMINATION
INDEPENDENT ADDRESS AND DATA DATA TENURE
ARBITRATION
SINGLE-BEAT TRANSFER
TERMINATION
Figure 8-3. Overlapping Tenures on the MPC750 Bus for a Single-Beat Transfer
The basic functions of the address and data tenures are as follows: * Address tenure -- Arbitration: During arbitration, address bus arbitration signals are used to gain mastership of the address bus.
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
8-6
Memory Access Protocol
*
-- Transfer: After the MPC750 is the address bus master, it transfers the address on the address bus. The address signals and the transfer attribute signals control the address transfer. The address parity and address parity error signals ensure the integrity of the address transfer. -- Termination: After the address transfer, the system signals that the address tenure is complete or that it must be repeated. Data tenure -- Arbitration: To begin the data tenure, the MPC750 arbitrates for mastership of the data bus. -- Transfer: After the MPC750 is the data bus master, it samples the data bus for read operations or drives the data bus for write operations. The data parity and data parity error signals ensure the integrity of the data transfer. -- Termination: Data termination signals are required after each data beat in a data transfer. Note that in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat.
The MPC750 generates an address-only bus transfer during the execution of the dcbz instruction (and for the dcbi, dcbf, dcbst, sync, and eieio instructions, if HID0[ABE] is enabled), which uses only the address bus with no data transfer involved. Additionally, the MPC750's retry capability provides an efficient snooping protocol for systems with multiple memory systems (including caches) that must remain coherent.
8.2.1
Arbitration Signals
Arbitration for both address and data bus mastership is performed by a central, external arbiter and, minimally, by the arbitration signals shown in Section 7.2.1, "Address Bus Arbitration Signals." Most arbiter implementations require additional signals to coordinate bus master/slave/snooping activities. Note that address bus busy (ABB) and data bus busy (DBB) are bidirectional signals. These signals are inputs unless the MPC750 has mastership of one or both of the respective buses; they must be connected high through pull-up resistors so that they remain negated when no devices have control of the buses. The following list describes the address arbitration signals: * * BR (bus request)--Assertion indicates that the MPC750 is requesting mastership of the address bus. BG (bus grant)--Assertion indicates that the MPC750 may, with the proper qualification, assume mastership of the address bus. A qualified bus grant occurs when BG is asserted and ABB and ARTRY are negated. If the MPC750 is parked, BR need not be asserted for the qualified bus grant.
MOTOROLA
Chapter 8. System Interface Operation
8-7
Memory Access Protocol
*
ABB (address bus busy)-- Assertion by the MPC750 indicates that the MPC750 is the address bus master. DBG (data bus grant)--Indicates that the MPC750 may, with the proper qualification, assume mastership of the data bus. A qualified data bus grant occurs when DBG is asserted while DBB, DRTRY, and ARTRY are negated. The DBB signal is driven by the current bus master, DRTRY is only driven from the bus, and ARTRY is from the bus, but only for the address bus tenure associated with the current data bus tenure (that is, not from another address tenure). DBWO (data bus write only)--Assertion indicates that the MPC750 may perform the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. If DBWO is asserted, the MPC750 will assume data bus mastership for a pending data bus write operation; the MPC750 will take the data bus for a pending read operation if this input is asserted along with DBG and no write is pending. Care must be taken with DBWO to ensure the desired write is queued (for example, a cache-line snoop push-out operation). DBB (data bus busy)--Assertion by the MPC750 indicates that the MPC750 is the data bus master. The MPC750 always assumes data bus mastership if it needs the data bus and is given a qualified data bus grant (see DBG). For more detailed information on the arbitration signals, refer to Section 7.2.1, "Address Bus Arbitration Signals," and Section 7.2.6, "Data Bus Arbitration Signals."
The following list describes the data arbitration signals: *
*
*
8.2.2
Address Pipelining and Split-Bus Transactions
The MPC750 protocol provides independent address and data bus capability to support pipelined and split-bus transaction system organizations. Address pipelining allows the address tenure of a new bus transaction to begin before the data tenure of the current transaction has finished. Split-bus transaction capability allows other bus activity to occur (either from the same master or from different masters) between the address and data tenures of a transaction. While this capability does not inherently reduce memory latency, support for address pipelining and split-bus transactions can greatly improve effective bus/memory throughput. For this reason, these techniques are most effective in shared-memory multimaster implementations where bus bandwidth is an important measurement of system performance. External arbitration is required in systems in which multiple devices must compete for the system bus. The design of the external arbiter affects pipelining by regulating address bus grant (BG), data bus grant (DBG), and address acknowledge (AACK) signals. For example, a one-level pipeline is enabled by asserting AACK to the current address bus master and
8-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Address Bus Tenure
granting mastership of the address bus to the next requesting master before the current data bus tenure has completed. Two address tenures can occur before the current data bus tenure completes. The MPC750 can pipeline its own transactions to a depth of one level (intraprocessor pipelining); however, the MPC750 bus protocol does not constrain the maximum number of levels of pipelining that can occur on the bus between multiple masters (interprocessor pipelining). The external arbiter must control the pipeline depth and synchronization between masters and slaves. In a pipelined implementation, data bus tenures are kept in strict order with respect to address tenures. However, external hardware can further decouple the address and data buses, allowing the data tenures to occur out of order with respect to the address tenures. This requires some form of system tag to associate the out-of-order data transaction with the proper originating address transaction (not defined for the MPC750 interface). Individual bus requests and data bus grants from each processor can be used by the system to implement tags to support interprocessor, out-of-order transactions. The MPC750 supports a limited intraprocessor out-of-order, split-transaction capability via the data bus write only (DBWO) signal. For more information about using DBWO, see Section 8.10, "Using Data Bus Write Only."
8.3
Address Bus Tenure
This section describes the three phases of the address tenure--address bus arbitration, address transfer, and address termination.
8.3.1
Address Bus Arbitration
When the MPC750 needs access to the external bus and it is not parked (BG is negated), it asserts bus request (BR) until it is granted mastership of the bus and the bus is available (see Figure 8-4). The external arbiter must grant master-elect status to the potential master by asserting the bus grant (BG) signal. The MPC750 requesting the bus determines that the bus is available when the ABB input is negated. When the address bus is not busy (ABB input is negated), BG is asserted and the address retry (ARTRY) input is negated. This is referred to as a qualified bus grant. The potential master assumes address bus mastership by asserting ABB when it receives a qualified bus grant.
MOTOROLA
Chapter 8. System Interface Operation
8-9
Address Bus Tenure
-1 Logical Bus Clock need_bus BR bg abb artry
0
1
qual BG ABB
Figure 8-4. Address Bus Arbitration
External arbiters must allow only one device at a time to be the address bus master. Implementations in which no other device can be a master, BG can be grounded (always asserted) to continually grant mastership of the address bus to the MPC750. If the MPC750 asserts BR before the external arbiter asserts BG, the MPC750 is considered to be unparked, as shown in Figure 8-4. Figure 8-5 shows the parked case, where a qualified bus grant exists on the clock edge following a need_bus condition. Notice that the bus clock cycle required for arbitration is eliminated if the MPC750 is parked, reducing overall memory latency for a transaction. The MPC750 always negates ABB for at least one bus clock cycle after AACK is asserted, even if it is parked and has another transaction pending. Typically, bus parking is provided to the device that was the most recent bus master; however, system designers may choose other schemes such as providing unrequested bus grants in situations where it is easy to correctly predict the next device requesting bus mastership.
8-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Address Bus Tenure
-1
0
1
need_bus BR bg abb artry
qual BG ABB
Figure 8-5. Address Bus Arbitration Showing Bus Parking
When the MPC750 receives a qualified bus grant, it assumes address bus mastership by asserting ABB and negating the BR output signal. Meanwhile, the MPC750 drives the address for the requested access onto the address bus and asserts TS to indicate the start of a new transaction. When designing external bus arbitration logic, note that the MPC750 may assert BR without using the bus after it receives the qualified bus grant. For example, in a system using bus snooping, if the MPC750 asserts BR to perform a replacement copy-back operation, another device can invalidate that line before the MPC750 is granted mastership of the bus. Once the MPC750 is granted the bus, it no longer needs to perform the copy-back operation; therefore, the MPC750 does not assert ABB and does not use the bus for the copy-back operation. Note that the MPC750 asserts BR for at least one clock cycle in these instances. System designers should note that it is possible to ignore the ABB signal, and regenerate the state of ABB locally within each device by monitoring the TS and AACK input signals. The MPC750 allows this operation by using both the ABB input signal and a locally regenerated version of ABB to determine if a qualified bus grant state exists (both sources are internally ORed together). The ABB signal may only be ignored if ABB and TS are asserted simultaneously by all masters, or where arbitration (through assertion of BG) is properly managed in cases where the regenerated ABB may not properly track the ABB signal on the bus. If the MPC750's ABB signal is ignored by the system, it must be connected to a pull-up resistor to ensure proper operation. Additionally, the MPC750 will not qualify a bus grant during the cycle that TS is asserted on the bus by any master. Address
MOTOROLA Chapter 8. System Interface Operation 8-11
Address Bus Tenure
bus arbitration without the use of the ABB signal requires that every assertion of TS be acknowledged by an assertion of AACK while the processor is not in sleep mode.
8.3.2
Address Transfer
During the address transfer, the physical address and all attributes of the transaction are transferred from the bus master to the slave device(s). Snooping logic may monitor the transfer to enforce cache coherency; see discussion about snooping in Section 8.3.3, "Address Transfer Termination." The signals used in the address transfer include the following signal groups: * * * Address transfer start signal: transfer start (TS) Address transfer signals: address bus (A[0-31]), and address parity (AP[0-3]) Address transfer attribute signals: transfer type (TT[0-4]), transfer size (TSIZ[0-2]), transfer burst (TBST), cache inhibit (CI), write-through (WT), and global (GBL)
Figure 8-6 shows that the timing for all of these signals, except TS, is identical. All of the address transfer and address transfer attribute signals are combined into the ADDR+ grouping in Figure 8-6. The TS signal indicates that the MPC750 has begun an address transfer and that the address and transfer attributes are valid (within the context of a synchronous bus). The MPC750 always asserts TS coincident with ABB. As an input, TS need not coincide with the assertion of ABB on the bus (that is, TS can be asserted with, or on, a subsequent clock cycle after ABB is asserted; the MPC750 tracks this transaction correctly). In Figure 8-6, the address transfer occurs during bus clock cycles 1 and 2 (arbitration occurs in bus clock cycle 0 and the address transfer is terminated in bus clock 3). In this diagram, the address bus termination input, AACK, is asserted to the MPC750 on the bus clock following assertion of TS (as shown by the dependency line). This is the minimum duration of the address transfer for the MPC750; the duration can be extended by delaying the assertion of AACK for one or more bus clocks.
8-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Address Bus Tenure
0
1
2
3
4
qual BG TS ABB ADDR+ aack artry_in
Figure 8-6. Address Bus Transfer
8.3.2.1
Address Bus Parity
The MPC750 always generates 1 bit of correct odd-byte parity for each of the 4 bytes of address when a valid address is on the bus. The calculated values are placed on the AP[0-3] outputs when the MPC750 is the address bus master. If the MPC750 is not the master and TS and GBL are asserted together (qualified condition for snooping memory operations), the calculated values are compared with the AP[0-3] inputs. If there is an error, and address parity checking is enabled (HID0[EBA] set to 1), a machine check exception is generated. An address bus parity error causes a checkstop condition if MSR[ME] is cleared to 0. For more information about checkstop conditions, see Chapter 4, "Exceptions."
8.3.2.2
Address Transfer Attribute Signals
The transfer attribute signals include several encoded signals such as the transfer type (TT[0-4]) signals, transfer burst (TBST) signal, transfer size (TSIZ[0-2]) signals, write-through (WT), and cache inhibit (CI). Section 7.2.4, "Address Transfer Attribute Signals," describes the encodings for the address transfer attribute signals. 8.3.2.2.1 Transfer Type (TT[0-4]) Signals
Snooping logic should fully decode the transfer type signals if the GBL signal is asserted. Slave devices can sometimes use the individual transfer type signals without fully decoding the group. For a complete description of the encoding for TT[0-4], refer to Table 8-1 and Table 8-2.
MOTOROLA
Chapter 8. System Interface Operation
8-13
Address Bus Tenure
8.3.2.2.2
Transfer Size (TSIZ[0-2]) Signals
The TSIZ[0-2] signals indicate the size of the requested data transfer as shown in Table 8-1. The TSIZ[0-2] signals may be used along with TBST and A[29-31] to determine which portion of the data bus contains valid data for a write transaction or which portion of the bus should contain valid data for a read transaction. Note that for a burst transaction (as indicated by the assertion of TBST), TSIZ[0-2] are always set to 0b010. Therefore, if the TBST signal is asserted, the memory system should transfer a total of eight words (32 bytes), regardless of the TSIZ[0-2] encodings.
Table 8-1. Transfer Size Signal Encodings
TBST Asserted Negated Negated Negated Negated Negated Negated Negated Negated TSIZ0 0 0 0 0 0 1 1 1 1 TSIZ1 1 0 0 1 1 0 0 1 1 TSIZ2 0 0 1 0 1 0 1 0 1 Transfer Size Eight-word burst Eight bytes One byte Two bytes Three bytes Four bytes Five bytes (N/A) Six bytes (N/A) Seven bytes (N/A)
The basic coherency size of the bus is defined to be 32 bytes (corresponding to one cache line). Data transfers that cross an aligned, 32-byte boundary either must present a new address onto the bus at that boundary (for coherency consideration) or must operate as noncoherent data with respect to the MPC750. The MPC750 never generates a bus transaction with a transfer size of 5 bytes, 6 bytes, or 7 bytes. For operations generated by the eciwx/ecowx instructions, a transfer size of 4 bytes is implied, and the TBST and TSIZ[0:2] signals are redefined to specify the resource ID (RID). The RID is copied from bits 28-31 of the external access register (EAR). For these operations, the TBST signal carries the EAR[28] data without inversion (active high). 8.3.2.2.3 Write-Through (WT) Signal
The MPC750 provides the WT signal to indicate a write-through operation as determined by the WIM bit settings during address translation by the MMU. The WT signal is also asserted for burst writes due to the execution of the dcbf and dcbst instructions, and snoop push operations. The WT signal is deasserted for accesses caused by the execution of the ecowx instruction. During read operations the MPC750 uses the WT signal to indicate whether the transaction is an instruction fetch (WT set to 1), or a data read operation (WT cleared to 0).
8-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Address Bus Tenure
8.3.2.2.4
Cache Inhibit (CI) Signal
The MPC750 indicates the caching-inhibited status of a transaction (determined by the setting of the WIM bits by the MMU) through the use of the CI signal. The CI signal is asserted even if the L1 caches are disabled or locked. This signal is also asserted for bus transactions caused by the execution of eciwx and ecowx instructions independent of the address translation.
8.3.2.3
Burst Ordering During Data Transfers
During burst data transfer operations, 32 bytes of data (one cache line) are transferred to or from the cache in order. Burst write transfers are always performed zero double word first, but since burst reads are performed critical double word first, a burst read transfer may not start with the first double word of the cache line, and the cache line fill may wrap around the end of the cache line. Table 8-2 describes the data bus burst ordering.
Table 8-2. Burst Ordering
For Starting Address: Data Transfer A[27-28] = 00 First data beat Second data beat Third data beat Fourth data beat Note: A[29-31] are always 0b000 for burst transfers by the MPC750. DW0 DW1 DW2 DW3 A[27-28] = 01 DW1 DW2 DW3 DW0 A[27-28] = 10 DW2 DW3 DW0 DW1 A[27-28] = 11 DW3 DW0 DW1 DW2
8.3.2.4
Effect of Alignment in Data Transfers
Table 8-3 lists the aligned transfers that can occur on the MPC750 bus. These are transfers in which the data is aligned to an address that is an integral multiple of the size of the data. For example, Table 8-3 shows that 1-byte data is always aligned; however, for a 4-byte word to be aligned, it must be oriented on an address that is a multiple of 4.
MOTOROLA
Chapter 8. System Interface Operation
8-15
Address Bus Tenure
Table 8-3. Aligned Data Transfers
Data Bus Byte Lane(s) Transfer Size TSIZ0 TSIZ1 TSIZ2 A[29-31] 0 Byte 0 0 0 0 0 0 0 0 Half word 0 0 0 0 Word 1 1 Double word 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 000 001 010 011 100 101 110 111 000 010 100 110 000 100 000 -- -- -- -- -- -- -- -- -- -- -- 1 -- -- -- -- -- -- -- -- -- -- -- 2 -- -- -- -- -- -- -- -- -- -- -- 3 -- -- -- -- -- -- -- -- -- -- -- 4 -- -- -- -- -- -- -- -- -- -- -- 5 -- -- -- -- -- -- -- -- -- -- -- 6 -- -- -- -- -- -- -- -- -- -- -- 7 -- -- -- -- -- -- -- -- -- -- --
Notes: These entries indicate the byte portions of the requested operand that are read or written during that bus transaction. These entries are not required and are ignored during read transactions and are driven with undefined data during all write transactions.
The MPC750 supports misaligned memory operations, although their use may substantially degrade performance. Misaligned memory transfers address memory that is not aligned to the size of the data being transferred (such as, a word read of an odd byte address). Although most of these operations hit in the primary cache (or generate burst memory operations if they miss), the MPC750 interface supports misaligned transfers within a word (32-bit aligned) boundary, as shown in Table 8-4. Note that the 4-byte transfer in Table 8-4 is only one example of misalignment. As long as the attempted transfer does not cross a word boundary, the MPC750 can transfer the data on the misaligned address (for example, a half-word read from an odd byte-aligned address). An attempt to address data that crosses a word boundary requires two bus transfers to access the data. Due to the performance degradations associated with misaligned memory operations, they are best avoided. In addition to the double-word straddle boundary condition, the address translation logic can generate substantial exception overhead when the load/store multiple and load/store string instructions access misaligned data. It is strongly recommended that software attempt to align data where possible.
8-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Address Bus Tenure
Table 8-4. Misaligned Data Transfers (Four-Byte Examples)
Transfer Size (Four Bytes) Aligned Misaligned--first access second access Misaligned--first access second access Misaligned--first access second access Aligned Misaligned--first access second access Misaligned--first access second access Misaligned--first access second access Notes: A: Byte lane used --:Byte lane not used Data Bus Byte Lanes TSIZ[0-2] A[29-31] 0 100 011 001 010 0 10 001 011 100 011 001 010 010 001 011 000 001 100 010 100 011 100 100 101 000 110 000 111 000 -- -- -- -- -- -- -- A -- A -- A A 1 A A -- -- -- -- -- -- -- -- -- A -- A 2 A A -- A -- -- -- -- -- -- -- -- -- A 3 A A -- A -- A -- -- -- -- -- -- -- -- 4 -- -- A -- A -- A A -- -- -- -- -- -- 5 -- -- -- -- A -- A A A -- -- -- -- -- 6 -- -- -- -- -- -- A A A -- A -- -- -- 7 -- -- -- -- -- -- -- A A -- A -- A --
8.3.2.4.1
Alignment of External Control Instructions
The size of the data transfer associated with the eciwx and ecowx instructions is always 4 bytes. If the eciwx or ecowx instruction is misaligned and crosses any word boundary, the MPC750 will generate an alignment exception.
8.3.3
Address Transfer Termination
The address tenure of a bus operation is terminated when completed with the assertion of AACK, or retried with the assertion of ARTRY. The MPC750 does not terminate the address transfer until the AACK (address acknowledge) input is asserted; therefore, the system can extend the address transfer phase by delaying the assertion of AACK to the MPC750. The assertion of AACK can be as early as the bus clock cycle following TS (see Figure 8-7), which allows a minimum address tenure of two bus cycles. As shown in Figure 8-7, these signals are asserted for one bus clock cycle, three-stated for half of the next bus clock cycle, driven high till the following bus cycle, and finally three-stated. Note that AACK must be asserted for only one bus clock cycle.
MOTOROLA
Chapter 8. System Interface Operation
8-17
Address Bus Tenure
The address transfer can be terminated with the requirement to retry if ARTRY is asserted anytime during the address tenure and through the cycle following AACK. The assertion causes the entire transaction (address and data tenure) to be rerun. As a snooping device, the MPC750 asserts ARTRY for a snooped transaction that hits modified data in the data cache that must be written back to memory, or if the snooped transaction could not be serviced. As a bus master, the MPC750 responds to an assertion of ARTRY by aborting the bus transaction and re-requesting the bus. Note that after recognizing an assertion of ARTRY and aborting the transaction in progress, the MPC750 is not guaranteed to run the same transaction the next time it is granted the bus due to internal reordering of load and store operations. If an address retry is required, the ARTRY response will be asserted by a bus snooping device as early as the second cycle after the assertion of TS. Once asserted, ARTRY must remain asserted through the cycle after the assertion of AACK. The assertion of ARTRY during the cycle after the assertion of AACK is referred to as a qualified ARTRY. An earlier assertion of ARTRY during the address tenure is referred to as an early ARTRY. As a bus master, the MPC750 recognizes either an early or qualified ARTRY and prevents the data tenure associated with the retried address tenure. If the data tenure has already begun, the MPC750 aborts and terminates the data tenure immediately even if the burst data has been received. If the assertion of ARTRY is received up to or on the bus cycle following the first (or only) assertion of TA for the data tenure, the MPC750 ignores the first data beat, and if it is a load operation, does not forward data internally to the cache and execution units. If ARTRY is asserted after the first (or only) assertion of TA, improper operation of the bus interface may result. During the clock of a qualified ARTRY, the MPC750 also determines if it should negate BR and ignore BG on the following cycle. On the following cycle, only the snooping master that asserted ARTRY and needs to perform a snoop copy-back operation is allowed to assert BR. This guarantees the snooping master an opportunity to request and be granted the bus before the just-retried master can restart its transaction. Note that a nonclocked bus arbiter may detect the assertion of address bus request by the bus master that asserted ARTRY, and return a qualified bus grant one cycle earlier than shown in Figure 8-7. Note that if the MPC750 asserts ARTRY due to a snoop operation, and asserts BR in the bus cycle following ARTRY in order to perform a snoop push to memory it may be several bus cycles later before the MPC750 will be able to accept a BG. (The delay in responding to the assertion of BG only occurs during snoop pushes from the L2 cache.) The bus arbiter should keep BG asserted until it detects BR negated or TS asserted from the MPC750 indicating that the snoop copy-back has begun. The system should ensure that no other address tenures occur until the current snoop push from the MPC750 is completed. Snoop push delays can also be avoided by operating the L2 cache in write-through mode so no snoop pushes are required by the L2 cache.
8-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Data Bus Tenure
1
2
3
4
5
6
7
8
ts abb addr aack ARTRY BR qualBG ABB
Figure 8-7. Snooped Address Cycle with ARTRY
8.4
Data Bus Tenure
This section describes the data bus arbitration, transfer, and termination phases defined by the MPC750 memory access protocol. The phases of the data tenure are identical to those of the address tenure, underscoring the symmetry in the control of the two buses.
8.4.1
Data Bus Arbitration
Data bus arbitration uses the data arbitration signal group--DBG, DBWO, and DBB. Additionally, the combination of TS and TT[0-4] provides information about the data bus request to external logic. The TS signal is an implied data bus request from the MPC750; the arbiter must qualify TS with the transfer type (TT) encodings to determine if the current address transfer is an address-only operation, which does not require a data bus transfer (see Figure 8-7). If the data bus is needed, the arbiter grants data bus mastership by asserting the DBG input to the MPC750. As with the address bus arbitration phase, the MPC750 must qualify the DBG input with a number of input signals before assuming bus mastership, as shown in Figure 8-8.
MOTOROLA
Chapter 8. System Interface Operation
8-19
Data Bus Tenure
0
1
2
3
TS dbg dbb drtry
qual DBG DBB
Figure 8-8. Data Bus Arbitration
A qualified data bus grant can be expressed as the following: QDBG = DBG asserted while DBB, DRTRY, and ARTRY (associated with the data bus operation) are negated. When a data tenure overlaps with its associated address tenure, a qualified ARTRY assertion coincident with a data bus grant signal does not result in data bus mastership (DBB is not asserted). Otherwise, the MPC750 always asserts DBB on the bus clock cycle after recognition of a qualified data bus grant. Since the MPC750 can pipeline transactions, there may be an outstanding data bus transaction when a new address transaction is retried. In this case, the MPC750 becomes the data bus master to complete the previous transaction.
8.4.1.1
Using the DBB Signal
The DBB signal should be connected between masters if data tenure scheduling is left to the masters. Optionally, the memory system can control data tenure scheduling directly with DBG. However, it is possible to ignore the DBB signal in the system if the DBB input is not used as the final data bus allocation control between data bus masters, and if the memory system can track the start and end of the data tenure. If DBB is not used to signal the end of a data tenure, DBG is only asserted to the next bus master the cycle before the cycle that the next bus master may actually begin its data tenure, rather than asserting it earlier (usually during another master's data tenure) and allowing the negation of DBB to be the final gating signal for a qualified data bus grant. Even if DBB is ignored in the system, the MPC750 always recognizes its own assertion of DBB, and requires one cycle after data tenure completion to negate its own DBB before recognizing a qualified data bus
8-20 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Data Bus Tenure
grant for another data tenure. If DBB is ignored in the system, it must still be connected to a pull-up resistor on the MPC750 to ensure proper operation.
8.4.2
Data Bus Write Only
As a result of address pipelining, the MPC750 may have up to two data tenures queued to perform when it receives a qualified DBG. Generally, the data tenures should be performed in strict order (the same order) as their address tenures were performed. The MPC750, however, also supports a limited out-of-order capability with the data bus write only (DBWO) input. When recognized on the clock of a qualified DBG, DBWO may direct the MPC750 to perform the next pending data write tenure even if a pending read tenure would have normally been performed first. For more information on the operation of DBWO, refer to Section 8.10, "Using Data Bus Write Only." If the MPC750 has any data tenures to perform, it always accepts data bus mastership to perform a data tenure when it recognizes a qualified DBG. If DBWO is asserted with a qualified DBG and no write tenure is queued to run, the MPC750 still takes mastership of the data bus to perform the next pending read data tenure. Generally, DBWO should only be used to allow a copy-back operation (burst write) to occur before a pending read operation. If DBWO is used for single-beat write operations, it may negate the effect of the eieio instruction by allowing a write operation to precede a program-scheduled read operation.
8.4.3
Data Transfer
The data transfer signals include DH[0-31], DL[0-31], and DP[0-7]. For memory accesses, the DH and DL signals form a 64-bit data path for read and write operations. The MPC750 transfers data in either single- or four-beat burst transfers. Single-beat operations can transfer from 1 to 8 bytes at a time and can be misaligned; see Section 8.3.2.4, "Effect of Alignment in Data Transfers." Burst operations always transfer eight words and are aligned on eight-word address boundaries. Burst transfers can achieve significantly higher bus throughput than single-beat operations. The type of transaction initiated by the MPC750 depends on whether the code or data is cacheable and, for store operations whether the cache is in write-back or write-through mode, which software controls on either a page or block basis. Burst transfers support cacheable operations only; that is, memory structures must be marked as cacheable (and write-back for data store operations) in the respective page or block descriptor to take advantage of burst transfers. The MPC750 output TBST indicates to the system whether the current transaction is a single- or four-beat transfer (except during eciwx/ecowx transactions, when it signals the state of EAR[28]). A burst transfer has an assumed address order. For load or store operations that miss in the cache (and are marked as cacheable and, for stores, write-back
MOTOROLA Chapter 8. System Interface Operation 8-21
Data Bus Tenure
in the MMU), the MPC750 uses the double-word-aligned address associated with the critical code or data that initiated the transaction. This minimizes latency by allowing the critical code or data to be forwarded to the processor before the rest of the cache line is filled. For all other burst operations, however, the cache line is transferred beginning with the eight-word-aligned data.
8.4.4
Data Transfer Termination
Four signals are used to terminate data bus transactions--TA, DRTRY (data retry), TEA (transfer error acknowledge), and ARTRY. The TA signal indicates normal termination of data transactions. It must always be asserted on the bus cycle coincident with the data that it is qualifying. It may be withheld by the slave for any number of clocks until valid data is ready to be supplied or accepted. DRTRY indicates invalid read data in the previous bus clock cycle. DRTRY extends the current data beat and does not terminate it. If it is asserted after the last (or only) data beat, the MPC750 negates DBB but still considers the data beat active and waits for another assertion of TA. DRTRY is ignored on write operations. TEA indicates a nonrecoverable bus error event. Upon receiving a final (or only) termination condition, the MPC750 always negates DBB for one cycle. If DRTRY is asserted by the memory system to extend the last (or only) data beat past the negation of DBB, the memory system should three-state the data bus on the clock after the final assertion of TA, even though it will negate DRTRY on that clock. This is to prevent a potential momentary data bus conflict if a write access begins on the following cycle. The TEA signal is used to signal a nonrecoverable error during the data transaction. It may be asserted on any cycle during DBB, or on the cycle after a qualified TA during a read operation, except when no-DRTRY mode is selected (where no-DRTRY mode cancels checking the cycle after TA). The assertion of TEA terminates the data tenure immediately even if in the middle of a burst; however, it does not prevent incorrect data that has just been acknowledged with TA from being written into the MPC750's cache or GPRs. The assertion of TEA initiates either a machine check exception or a checkstop condition based on the setting of the MSR[ME] bit. An assertion of ARTRY causes the data tenure to be terminated immediately if the ARTRY is for the address tenure associated with the data tenure in operation. If ARTRY is connected for the MPC750, the earliest allowable assertion of TA to the MPC750 is directly dependent on the earliest possible assertion of ARTRY to the MPC750; see Section 8.3.3, "Address Transfer Termination."
8.4.4.1
Normal Single-Beat Termination
Normal termination of a single-beat data read operation occurs when TA is asserted by a responding slave. The TEA and DRTRY signals must remain negated during the transfer (see Figure 8-9).
8-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Data Bus Tenure
0
1
2
3
4
TS qual DBG DBB data ta drtry AACK
Figure 8-9. Normal Single-Beat Read Termination
The DRTRY signal is not sampled during data writes, as shown in Figure 8-10.
0 1 2 3
TS qual DBG DBB data ta drtry AACK
Figure 8-10. Normal Single-Beat Write Termination
MOTOROLA
Chapter 8. System Interface Operation
8-23
Data Bus Tenure
8.4.4.2
Normal Burst Termination
Normal termination of a burst transfer occurs when TA is asserted for four bus clock cycles, as shown in Figure 8-11. The bus clock cycles in which TA is asserted need not be consecutive, thus allowing pacing of the data transfer beats. For read bursts to terminate successfully, TEA and DRTRY must remain negated during the transfer. For write bursts, TEA must remain negated for a successful transfer. DRTRY is ignored during data writes.
1 2 3 4 5 6 7
TS qual DBG DBB data ta drtry
Figure 8-11. Normal Burst Transaction
For read bursts, DRTRY may be asserted one bus clock cycle after TA is asserted to signal that the data presented with TA is invalid and that the processor must wait for the negation of DRTRY before forwarding data to the processor (see Figure 8-12). Thus, a data beat can be terminated by a predicted branch with TA and then one bus clock cycle later confirmed with the negation of DRTRY. The DRTRY signal is valid only for read transactions. TA must be asserted on the bus clock cycle before the first bus clock cycle of the assertion of DRTRY; otherwise the results are undefined. The DRTRY signal extends data bus mastership such that other processors cannot use the data bus until DRTRY is negated. Therefore, in the example in Figure 8-12, DBB cannot be asserted until bus clock cycle 6. This is true for both read and write operations even though DRTRY does not extend bus mastership for write operations.
8-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Data Bus Tenure
1
2
3
4
5
TS qual DBG DBB data ta drtry
Figure 8-12. Termination with DRTRY
Figure 8-13 shows the effect of using DRTRY during a burst read. It also shows the effect of using TA to pace the data transfer rate. Notice that in bus clock cycle 3 of Figure 8-13, TA is negated for the second data beat. The MPC750 data pipeline does not proceed until bus clock cycle 4 when the TA is reasserted.
1 2 3 4 5 6 7 8 9
TS qual DBG DBB data ta drtry
Figure 8-13. Read Burst with TA Wait States and DRTRY
Note that DRTRY is useful for systems that implement predicted forwarding of data such as those with direct-mapped, third-level caches where hit/miss is determined on the following bus clock cycle, or for parity- or ECC-checked memory systems. Note that DRTRY may not be implemented on other processors that implement the PowerPC architecture.
MOTOROLA Chapter 8. System Interface Operation 8-25
Data Bus Tenure
8.4.4.3
Data Transfer Termination Due to a Bus Error
The TEA signal indicates that a bus error occurred. It may be asserted while DBB (and/or DRTRY for read operations) is asserted. Asserting TEA to the MPC750 terminates the transaction; that is, further assertions of TA and DRTRY are ignored and DBB is negated. Assertion of the TEA signal causes a machine check exception (and possibly a checkstop condition within the MPC750). For more information, see Section 4.5.2, "Machine Check Exception (0x00200)." Note also that the MPC750 does not implement a synchronous error capability for memory accesses. This means that the exception instruction pointer saved into the SRR0 register does not point to the memory operation that caused the assertion of TEA, but to the instruction about to be executed (perhaps several instructions later). However, assertion of TEA does not invalidate data entering the GPR or the cache. Additionally, the address corresponding to the access that caused TEA to be asserted is not latched by the MPC750. To recover, the exception handler must determine and remedy the cause of the TEA, or the MPC750 must be reset; therefore, this function should only be used to indicate fatal system conditions to the processor (such as parity or uncorrectable ECC errors). After the MPC750 has committed to run a transaction, that transaction must eventually complete. Address retry causes the transaction to be restarted; TA wait states and DRTRY assertion for reads delay termination of individual data beats. Eventually, however, the system must either terminate the transaction or assert the TEA signal. For this reason, care must be taken to check for the end of physical memory and the location of certain system facilities to avoid memory accesses that result in the assertion of TEA. Note that TEA generates a machine check exception depending on MSR[ME]. Clearing the machine check exception enable control bits leads to a true checkstop condition (instruction execution halted and processor clock stopped).
8.4.5
Memory Coherency--MEI Protocol
The MPC750 provides dedicated hardware to provide memory coherency by snooping bus transactions. The address retry capability enforces the three-state, MEI cache-coherency protocol (see Figure 8-14). The global (GBL) output signal indicates whether the current transaction must be snooped by other snooping devices on the bus. Address bus masters assert GBL to indicate that the current transaction is a global access (that is, an access to memory shared by more than one device). If GBL is not asserted for the transaction, that transaction is not snooped. When other devices detect the GBL input asserted, they must respond by snooping the broadcast address. Normally, GBL reflects the M bit value specified for the memory reference in the corresponding translation descriptor(s). Note that care must be taken to minimize the
8-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Data Bus Tenure
number of pages marked as global, because the retry protocol discussed in the previous section is used to enforce coherency and can require significant bus bandwidth. When the MPC750 is not the address bus master, GBL is an input. The MPC750 snoops a transaction if TS and GBL are asserted together in the same bus clock cycle (this is a qualified snooping condition). No snoop update to the MPC750 cache occurs if the snooped transaction is not marked global. This includes invalidation cycles. When the MPC750 detects a qualified snoop condition, the address associated with the TS is compared against the data cache tags. Snooping completes if no hit is detected. If, however, the address hits in the cache, the MPC750 reacts according to the MEI protocol shown in Figure 8-14, assuming the WIM bits are set to write-back, caching-allowed, and coherency-enforced modes (WIM = 001).
INVALID
SH/CRW WM
SH/CRW RM
MODIFIED
RH
WH SH
EXCLUSIVE
RH
WH BUS TRANSACTIONS
SH/CIR
SH =Snoop Hit = Snoop Push RH =Read Hit WH =Write Hit = Cache Line Fill WM=Write Miss RM =Read Miss SH/CRW=Snoop Hit, Cacheable Read/Write SH/CIR =Snoop Hit, Caching-Inhibited Read
Figure 8-14. MEI Cache Coherency Protocol--State Diagram (WIM = 001)
MOTOROLA
Chapter 8. System Interface Operation
8-27
Timing Examples
8.5
Timing Examples
This section shows timing diagrams for various scenarios. Figure 8-15 illustrates the fastest single-beat reads possible for the MPC750. This figure shows both minimal latency and maximum single-beat throughput. By delaying the data bus tenure, the latency increases, but, because of split-transaction pipelining, the overall throughput is not affected unless the data bus latency causes the third address tenure to be delayed. Note that all bidirectional signals are three-stated between bus tenures.
1 2 3 4 5 6 7 8 9 10 11 12
BR BG ABB TS A[0-31] TT[0-4] TBST GBL AACK ARTRY DBG DBB D[0-63] TA DRTRY TEA 1 2 3 4 5 6 7 8 9 10 11 12
In In In CPU A CPU A CPU A
Read
Read
Read
Figure 8-15. Fastest Single-Beat Reads
8-28
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Timing Examples
Figure 8-16 illustrates the fastest single-beat writes supported by the MPC750. All bidirectional signals are three-stated between bus tenures.
1 2 3 4 5 6 7 8 9 10 11 12
BR BG ABB TS A[0-31] TT[0-4] TBST GBL AACK ARTRY DBG DBB D[0-63] TA DRTRY TEA 1 2 3 4 5 6 7 8 9 10 11 12
Out Out Out CPU A CPU A CPU A
SBW
SBW
SBW
Figure 8-16. Fastest Single-Beat Writes
Figure 8-17 shows three ways to delay single-beat reads showing data-delay controls: * * * The TA signal can remain negated to insert wait states in clock cycles 3 and 4. For the second access, DBG could have been asserted in clock cycle 6. In the third access, DRTRY is asserted in clock cycle 11 to flush the previous data.
MOTOROLA
Chapter 8. System Interface Operation
8-29
Timing Examples
Note that all bidirectional signals are three-stated between bus tenures. The pipelining shown in Figure 8-17 can occur if the second access is not another load (for example, an instruction fetch).
1 2 3 4 5 6 7 8 9 10 11 12 13 14
BR BG ABB TS A[0-31] TT[0-4] TBST GBL AACK ARTRY DBG DBB D[0-63] TA DRTRY TEA 1 2 3 4 5 6 7 8 9 10 11 12 13 14
In In Bad In CPU A CPU A CPU A
Read
Read
Read
Figure 8-17. Single-Beat Reads Showing Data-Delay Controls
Figure 8-18 shows data-delay controls in a single-beat write operation. Note that all bidirectional signals are three-stated between bus tenures. Data transfers are delayed in the following ways: * *
8-30
The TA signal is held negated to insert wait states in clocks 3 and 4. In clock 6, DBG is held negated, delaying the start of the data tenure.
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Timing Examples
The last access is not delayed (DRTRY is valid only for read operations).
1 2 3 4 5 6 7 8 9 10 11 12
BR BG ABB TS A[0-31] TT[0-4] TBST GBL AACK ARTRY DBG DBB D[0-63] TA DRTRY TEA 1 2 3 4 5 6 7 8 9 10 11 12
Out Out Out CPU A CPU A CPU A
SBW
SBW
SBW
Figure 8-18. Single-Beat Writes Showing Data Delay Controls
Figure 8-19 shows the use of data-delay controls with burst transfers. Note that all bidirectional signals are three-stated between bus tenures. Note the following: * * * * The first data beat of bursted read data (clock 0) is the critical quad word. The write burst shows the use of TA signal negation to delay the third data beat. The final read burst shows the use of DRTRY on the third data beat. The address for the third transfer is delayed until the first transfer completes.
Chapter 8. System Interface Operation 8-31
MOTOROLA
Timing Examples
1
2
3
4
5
6
7
8
9
10
11
12
BR BG ABB TS A[0-31] TT[0-4] TBST GBL AACK ARTRY DBG DBB D[0-63] TA DRTRY TEA 1 2 3 4 5 6 7 8 9 10 11 12
Out Out Out CPU A CPU A CPU A
SBW
SBW
SBW
Figure 8-19. Burst Transfers with Data Delay Controls
Figure 8-20 shows the use of the TEA signal. Note that all bidirectional signals are three-stated between bus tenures. Note the following: * * * The first data beat of the read burst (in clock 0) is the critical quad word. The TEA signal truncates the burst write transfer on the third data beat. The MPC750 eventually causes an exception to be taken on the TEA event.
8-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
No-DRTRY Mode
1 BR BG ABB TS A[0-31] TT[0-4] TBST GBL AACK ARTRY DBG DBB D[0-63] TA DRTRY TEA 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
CPU A
CPU A
CPU A
Read
Write
Read
In 0
In 1
In 2
In 3
Out 0 Out 1 Out 2
In 0
In 1 In 2
In 3
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
Figure 8-20. Use of Transfer Error Acknowledge (TEA)
8.6
No-DRTRY Mode
The MPC750 supports an optional bus configuration that is selected by the assertion or negation of the DRTRY signal during the negation of the HRESET signal. The operation and selection of the optional bus configuration is described in the following sections. The MPC750 supports an optional mode to disable the use of the data retry function provided through the DRTRY signal. The no-DRTRY mode allows the forwarding of data during load operations to the internal CPU one bus cycle sooner than in the normal bus protocol.
MOTOROLA Chapter 8. System Interface Operation 8-33
Interrupt, Checkstop, and Reset Signal Operation
The 60x bus protocol specifies that, during load operations, the memory system normally has the capability to cancel data that was read by the master on the bus cycle after TA was asserted. In the MPC750 implementation, this late cancellation protocol requires the MPC750 to hold any loaded data at the bus interface for one additional bus clock to verify that the data is valid before forwarding it to the internal CPU. For systems that do not implement the DRTRY function, the MPC750 provides an optional no-DRTRY mode that eliminates this one-cycle stall during all load operations, and allows for the forwarding of data to the internal CPU immediately when TA is recognized. When the MPC750 is in the no-DRTRY mode, data can no longer be cancelled the cycle after it is acknowledged by an assertion of TA. Data is immediately forwarded to the CPU internally, and any attempt at late cancellation by the system may cause improper operation by the MPC750. When the MPC750 is following normal bus protocol, data may be cancelled the bus cycle after TA by either of two means--late cancellation by DRTRY, or late cancellation by ARTRY. When no-DRTRY mode is selected, both cancellation cases must be disallowed in the system design for the bus protocol. When no-DRTRY mode is selected for the MPC750, the system must ensure that DRTRY is not asserted to the MPC750. If it is asserted, it may cause improper operation of the bus interface. The system must also ensure that an assertion of ARTRY by a snooping device must occur before or coincident with the first assertion of TA to the MPC750, but not on the cycle after the first assertion of TA. Other than the inability to cancel data that was read by the master on the bus cycle after TA was asserted, the bus protocol for the MPC750 is identical to that for the basic transfer bus protocols described in this chapter. The MPC750 selects the desired DRTRY mode at startup by sampling the state of the DRTRY signal itself at the negation of the HRESET signal. If the DRTRY signal is negated at the negation of HRESET, normal operation is selected. If the DRTRY signal is asserted at the negation of HRESET, no-DRTRY mode is selected.
8.7
Interrupt, Checkstop, and Reset Signal Operation
This section describes external interrupts, checkstop operations, and hard and soft reset inputs.
8.7.1
External Interrupts
The external interrupt input signals (INT, SMI and MCP) of the MPC750 force the processor to take the external interrupt vector or the system management interrupt vector if the MSR[EE] is set, or the machine check interrupt if the MSR[ME] and the HID0[EMCP] bits are set.
8-34
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Processor State Signals
8.7.2
Checkstops
The MPC750 has two checkstop input signals--CKSTP_IN (nonmaskable) and MCP (enabled when MSR[ME] is cleared, and HID0[EMCP] is set), and a checkstop output (CKSTP_OUT) signal. If CKSTP_IN or MCP is asserted, the MPC750 halts operations by gating off all internal clocks. The MPC750 asserts CKSTP_OUT if CKSTP_IN is asserted. If CKSTP_OUT is asserted by the MPC750, it has entered the checkstop state, and processing has halted internally. The CKSTP_OUT signal can be asserted for various reasons including receiving a TEA signal and detection of external parity errors. For more information about checkstop state, see Section 4.5.2.2, "Checkstop State (MSR[ME] = 0)."
8.7.3
*
Reset Inputs
HRESET (hard reset)--The HRESET signal is used for power-on reset sequences, or for situations in which the MPC750 must go through the entire cold start sequence of internal hardware initializations. SRESET (soft reset)--The soft reset input provides warm reset capability. This input can be used to avoid forcing the MPC750 to complete the cold start sequence.
The MPC750 has two reset inputs, described as follows:
*
When either reset input negates, the processor attempts to fetch code from the system reset exception vector. The vector is located at offset 0x00100 from the exception prefix (all zeros or ones, depending on the setting of the exception prefix bit in the machine state register (MSR[IP]). The MSR[IP] bit is set for HRESET.
8.7.4
System Quiesce Control Signals
The system quiesce control signals (QREQ and QACK) allow the processor to enter the nap or sleep low-power states, and bring bus activity to a quiescent state in an orderly fashion. Prior to entering the nap or sleep power state, the MPC750 asserts the QREQ signal. This signal allows the system to terminate or pause any bus activities that are normally snooped. When the system is ready to enter the system quiesce state, it asserts the QACK signal. At this time the MPC750 may enter a quiescent (low power) state. When the MPC750 is in the quiescent state, it stops snooping bus activity. While the MPC750 is in the nap power state, the system power controller can enable snooping by the MPC750 by deasserting the QACK signal for at least eight bus clock cycles, after which the MPC750 is capable of snooping bus transactions. The reassertion of QACK following the snoop transactions will cause the MPC750 to reenter the nap power state.
8.8
Processor State Signals
This section describes the MPC750's support for atomic update and memory through the use of the lwarx/stwcx. opcode pair, and includes a description of the TLBISYNC input.
MOTOROLA Chapter 8. System Interface Operation 8-35
IEEE 1149.1a-1993 Compliant Interface
8.8.1
Support for the lwarx/stwcx. Instruction Pair
The Load Word and Reserve Indexed (lwarx) and the Store Word Conditional Indexed (stwcx.) instructions provide a means for atomic memory updating. Memory can be updated atomically by setting a reservation on the load and checking that the reservation is still valid before the store is performed. In the MPC750, the reservations are made on behalf of aligned, 32-byte sections of the memory address space. The reservation (RSRV) output signal is driven synchronously with the bus clock and reflects the status of the reservation coherency bit in the reservation address register; see Chapter 3, "L1 Instruction and Data Cache Operation," for more information. For information about timing, see Section 7.2.9.7.3, "Reservation (RSRV)--Output."
8.8.2
TLBISYNC Input
The TLBISYNC input allows for the hardware synchronization of changes to MMU tables when the MPC750 and another DMA master share the same MMU translation tables in system memory. It is asserted by a DMA master when it is using shared addresses that could be changed in the MMU tables by the MPC750 during the DMA master's tenure. The TLBISYNC input, when asserted to the MPC750, prevents the MPC750 from completing any instructions past a tlbsync instruction. Generally, during the execution of an eciwx or ecowx instruction by the MPC750, the selected DMA device should assert the MPC750's TLBISYNC signal and maintain it asserted during its DMA tenure if it is using a shared translation address. Subsequent instructions by the MPC750 should include a sync and tlbsync instruction before any MMU table changes are performed. This will prevent the MPC750 from making table changes disruptive to the other master during the DMA period.
8.9
IEEE 1149.1a-1993 Compliant Interface
The MPC750 boundary-scan interface is a fully-compliant implementation of the IEEE 1149.1a-1993 standard. This section describes the MPC750's IEEE 1149.1a-1993 (JTAG) interface.
8.9.1
* * *
JTAG/COP Interface
Debug control/observation (COP) Boundary scan (standard IEEE 1149.1a-1993 (JTAG) compliant interface) Support for manufacturing test
The MPC750 has extensive on-chip test capability including the following:
The COP and boundary scan logic are not used under typical operating conditions. Detailed discussion of the MPC750 test functions is beyond the scope of this document; however,
8-36 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Using Data Bus Write Only
sufficient information has been provided to allow the system designer to disable the test functions that would impede normal operation. The JTAG/COP interface is shown in Figure 8-21. For more information, refer to IEEE Standard Test Access Port and Boundary Scan Architecture IEEE STD 1149-1a-1993.
TDI (Test Data Input) TMS (Test Mode Select) TCK (Test Clock Input) TDO (Test Data Output) TRST (Test Reset)
Figure 8-21. IEEE 1149.1a-1993 Compliant Boundary Scan Interface
8.10 Using Data Bus Write Only
The MPC750 supports split-transaction pipelined transactions. It supports a limited out-of-order capability for its own pipelined transactions through the data bus write only (DBWO) signal. When recognized on the clock of a qualified DBG, the assertion of DBWO directs the MPC750 to perform the next pending data write tenure (if any), even if a pending read tenure would have normally been performed because of address pipelining. The DBWO signal does not change the order of write tenures with respect to other write tenures from the same MPC750. It only allows that a write tenure be performed ahead of a pending read tenure from the same MPC750. In general, an address tenure on the bus is followed strictly in order by its associated data tenure. Transactions pipelined by the MPC750 complete strictly in order. However, the MPC750 can run bus transactions out of order only when the external system allows the MPC750 to perform a cache-line-snoop-push-out operation (or other write transaction, if pending in the MPC750 write queues) between the address and data tenures of a read operation through the use of DBWO. This effectively envelopes the write operation within the read operation. Figure 8-22 shows how the DBWO signal is used to perform an enveloped write transaction.
MOTOROLA
Chapter 8. System Interface Operation
8-37
Using Data Bus Write Only
Read Address
Write Address
(1) BG ABB AACK
(2)
Enveloped Write Transaction
Write Data
Read Data
(2) DBG DBB DBWO
(1)
Figure 8-22. Data Bus Write Only Transaction
Note that although the MPC750 can pipeline any write transaction behind the read transaction, special care should be used when using the enveloped write feature. It is envisioned that most system implementations will not need this capability; for these applications, DBWO should remain negated. In systems where this capability is needed, DBWO should be asserted under the following scenario: 1. The MPC750 initiates a read transaction (either single-beat or burst) by completing the read address tenure with no address retry. 2. Then, the MPC750 initiates a write transaction by completing the write address tenure, with no address retry. 3. At this point, if DBWO is asserted with a qualified data bus grant to the MPC750, the MPC750 asserts DBB and drives the write data onto the data bus, out of order with respect to the address pipeline. The write transaction concludes with the MPC750 negating DBB. 4. The next qualified data bus grant signals the MPC750 to complete the outstanding read transaction by latching the data on the bus. This assertion of DBG should not be accompanied by an asserted DBWO. Any number of bus transactions by other bus masters can be attempted between any of these steps. Note the following regarding DBWO: * * DBWO can be asserted if no data bus read is pending, but it has no effect on write ordering. The ordering and presence of data bus writes is determined by the writes in the write queues at the time BG is asserted for the write address (not DBG). If a particular write is desired (for example, a cache-line-snoop-push-out operation), then BG must
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
8-38
Using Data Bus Write Only
*
be asserted after that particular write is in the queue and it must be the highest priority write in the queue at that time. A cache-line-snoop-push-out operation may be the highest priority write, but more than one may be queued. Because more than one write may be in the write queue when DBG is asserted for the write address, more than one data bus write may be enveloped by a pending data bus read.
The arbiter must monitor bus operations and coordinate the various masters and slaves with respect to the use of the data bus when DBWO is used. Individual DBG signals associated with each bus device should allow the arbiter to synchronize both pipelined and split-transaction bus organizations. Individual DBG and DBWO signals provide a primitive form of source-level tagging for the granting of the data bus. Note that use of the DBWO signal allows some operation-level tagging with respect to the MPC750 and the use of the data bus.
MOTOROLA
Chapter 8. System Interface Operation
8-39
Using Data Bus Write Only
8-40
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 9 L2 Cache Interface Operation
This chapter describes the MPC750 microprocessor L2 cache interface, and its configuration and operation. It describes how the MPC750 signals, defined in Chapter 7, "Signal Descriptions," interact to perform address and data transfers to and from the L2 cache. Note that the MPC740 microprocessor does not implement the L2 cache interface. Also, note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor."
9.1
L2 Cache Interface Overview
The MPC750's L2 cache interface is implemented with an on-chip, two-way set associative tag memory with 4096 tags per way, and a dedicated interface with support for up to 1 Mbyte of external synchronous SRAM for data storage. The tags are sectored to support either two cache blocks per tag entry (two sectors, 64 bytes), or four cache blocks per tag entry (four sectors, 128 bytes) depending on the L2 cache size. If the L2 cache is configured for 256 Kbytes or 512 Kbytes of external SRAM, the tags are configured for two sectors per L2 cache block. The L2 tags are configured for four sectors per L2 cache block when 1 Mbyte of external SRAM is used. Each sector (32-byte L1 cache block) in the L2 cache has its own valid and modified bits and other status bits that implement the MEI cache coherency protocol. The L2 cache control register (L2CR) allows control of L2 cache configuration and timing, byte-level data parity generation and checking, global invalidation of L2 contents, write-through operation, and L2 test support. The L2 cache interface provides two clock outputs that allow the clock inputs of the SRAMs to be driven at select frequency divisions of the processor core frequency. The MPC750's L2 cache normally is configured to operate in copy-back mode and maintains cache coherency through snooping. Figure 9-1 shows the MPC750 configured with a 1-Mbyte L2 cache.
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-1
L2 Cache Interface Overview
L2ADDR[16-0] L2DATA[0-63] L2DP[0-7] L2CE L2WE L2ZZ
(Optional) (Optional)
0 1
L2CLK_OUTA MPC750 L2SYNC_OUT L2SYNC_IN 0 1 L2CLK_OUTB (Optional)
ADDR[16-0] DATA[0-31] PARITY[0-3] E 128k x 36 W SRAM ADSC ADSP ZZ K
ADDR[16-0] DATA[0-31] PARITY[0-3] E 128k x 36 W SRAM ADSC ADSP ZZ K
Notes: - For a 1-Mbyte L2, use address bits 16-0 (bit 0 is LSB). - For a 512-Kbyte L2, use address bits 15-0 (bit 0 is LSB). - For a 256-Kbyte L2, use address bits 14-0 (bit 0 is LSB). - External clock routing should ensure that the rising edge of the L2 clock is coincident at the K input of all SRAMs and at the L2Sync_In input of the MPC750. The clock A network can be used solely or the clock B network can also be used depending on loading, frequency, and number of SRAMs. - No pull-up resistors are normally required for the L2 interface. - The MPC750 supports only one bank of SRAMs. - For high-speed operation, no more than two loads should be presented on each L2 interface signal.
Figure 9-1. Typical 1-Mbyte L2 Cache Configuration
9.1.1
L2 Cache Operation
The MPC750's L2 cache is a combined instruction and data cache that receives memory requests from both L1 instruction and data caches independently. The L1 requests are generally the result of instruction fetch misses, data load or store misses, write-through operations, or cache management instructions. Each L1 request generates an address lookup in the L2 tags. If a hit occurs, the instructions or data are forwarded to the L1 cache. A miss in the L2 tags causes the L1 request to be forwarded to the 60x bus interface. The cache block received from the bus is forwarded to the L1 cache immediately, and is also loaded into the L2 cache with the tag marked valid and unmodified. If the cache block loaded into the L2 causes a new tag entry to be allocated and the current tag entry is marked valid modified, the modified sectors of the tag to be replaced are castout from the L2 cache to the 60x bus. At any given time the L1 instruction cache may have one instruction fetch request, and the L1 data cache may have one load and two stores requesting L2 cache access. The L2 cache also services snoop requests from the 60x bus. When there are multiple pending requests to
9-2 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
L2 Cache Interface Overview
the L2 cache, snoop requests have highest priority, followed by data load and store requests (serviced on a first-in, first-out basis). Instruction fetch requests have the lowest priority in accessing the L2 cache when there are multiple accesses pending. If read requests from both the L1 instruction and data caches are pending, the L2 cache can perform hit-under-miss and supplies the available instruction or data while a bus transaction for the previous L2 cache miss is performed. The L2 cache does not support miss-under-miss, and the second instruction fetch or data load stalls until the bus operation resulting from the first L2 miss completes. All requests to the L2 cache that are marked cacheable (even if the respective L1 cache is disabled or locked) cause tag lookup and will be serviced if the instructions or data are in the L2 cache. Burst and single-beat read requests from the L1 caches that hit in the L2 cache are forwarded instructions or data, and the L2 LRU bit for that tag is updated. Burst writes from the L1 data cache due to a castout or replacement copyback are written only to the L2 cache, and the L2 cache sector is marked modified. Designers should note that during burst transfers into and out of the L2 cache SRAM array an address is generated by the MPC750 for each data beat. If the L2 cache is configured as write-through, the L2 sector is marked unmodified, and the write is forwarded to the 60x bus. If the L1 castout requires a new L2 tag entry to be allocated and the current tag is marked modified, any modified sectors of the tag to be replaced are cast out of the L2 cache to the 60x bus. Single-beat read requests from the L1 caches that miss in the L2 cache do not cause any state changes in the L2 cache and are forwarded on the 60x bus interface. Cacheable single-beat store requests marked copy-back that hit in the L2 are allowed to update the L2 cache sector, but do not cause L2 cache sector allocation or deallocation. Cacheable, single-beat store requests that miss in the L2 are forwarded to the 60x bus. Single-beat store requests marked write-through (through address translation or through the configuration of L2CR[L2WT]) are written to the L2 cache if they hit and are written to the 60x bus independent of the L2 hit/miss status. If the store hits in the L2 cache, the modified/unmodified status of the tag remains unchanged. All requests to the L2 cache that are marked cache-inhibited by address translation (through either the MMU or by default WIMG configuration) bypass the L2 cache and do not cause any L2 cache tag state change. The execution of the stwcx. instruction results in single-beat writes from the L1 data cache. These single-beat writes are processed by the L2 cache according to hit/miss status, L1 and L2 write-through configuration, and reservation-active status. If the address associated with the stwcx. instruction misses in the L2 cache or if the reservation is no longer active, the stwcx. instruction bypasses the L2 cache and is forwarded to the 60x bus interface. If the stwcx. hits in the L2 cache and the reservation is still active, one of the following actions occurs:
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-3
L2 Cache Interface Overview
*
*
If the stwcx. hits a modified sector in the L2 cache (independent of write-through status), or if the stwcx. hits both the L1 and L2 caches in copy-back mode, the stwcx. is written to the L2 and the reservation completes. If the stwcx. hits an unmodified sector in the L2 cache, and either the L1 or L2 is in write-through mode, the stwcx. is forwarded to the 60x bus interface and the sector hit in the L2 cache is invalidated.
9.1.2
L2 Cache Flushing
L1 cache-block-push operations generated by the execution of dcbf and dcbst instructions write through to the 60x bus interface and invalidate the L2 cache sector if they hit. The execution of dcbf and dcbst instructions that do not cause a cache-block-push from the L1 cache are forwarded to the L2 cache to perform a sector invalidation and/or push from the L2 cache to the 60x bus as required. If the dcbf and dcbst instructions do not cause a sector push from the L2 cache, they are forwarded to the 60x bus interface for address-only broadcast if HID0[ABE] is set to 1. The dcbi instruction is always forwarded to the L2 cache and causes a segment invalidation if a hit occurs. The dcbi instruction is also forwarded to the 60x bus interface for broadcast if HID0[ABE] is set to 1. The icbi instruction invalidates only L1 cache blocks and is never forwarded to the L2 cache. Any dcbz instructions marked global do not affect the L2 cache state. If a dcbz instruction hits in the L1 and L2 caches, the L1 data cache block is cleared and the dcbz instruction completes. If a dcbz instruction misses in the L2 cache, it is forwarded to the 60x bus interface for broadcast. Any dcbz instructions that are marked nonglobal act only on the L1 data cache. The sync and eieio instructions bypass the L2 cache and are forwarded to the 60x bus.
9.1.3
L2 Cache Control Register (L2CR)
The L2 cache control register is used to configure and enable the L2 cache. The L2CR is a supervisor-level read/write, implementation-specific register that is accessed as SPR 1017. The contents of the L2CR are cleared during power-on reset. Table 9-1 describes the L2CR bits. For additional information about the configuration of the L2CR, refer to Section 2.1.5, "L2 Cache Control Register (L2CR)."
Table 9-1. L2 Cache Control Register
Bits 0 Name L2E Function L2 enable. Enables L2 cache operation (including snooping) starting with the next transaction the L2 cache unit receives. Before enabling the L2 cache, the L2 clock must be configured through L2CR[2CLK], and the L2 DLL must stabilize (see the hardware specifications). All other L2CR bits must be set appropriately. The L2 cache may need to be invalidated globally. L2 data parity generation and checking enable. Enables parity generation and checking for the L2 data RAM interface. When disabled, generated parity is always zeros.
1
L2PE
9-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
L2 Cache Interface Overview
Table 9-1. L2 Cache Control Register (continued)
Bits 2-3 Name L2SIZ Function L2 size--Should be set according to the size of the L2 data RAMs used. A 256-Kbyte L2 cache requires a data RAM configuration of 32 Kbytes x 64 bits; a 512-Kbyte L2 cache requires a configuration of 64 Kbyte x 64 bits; a 1-Mbyte L2 cache requires a configuration of 128K x 64 bits. 00 Reserved 01 256 Kbyte 10 512 Kbyte 11 1 Mbyte L2 clock ratio (core-to-L2 frequency divider). Specifies the clock divider ratio based from the core clock frequency that the L2 data RAM interface is to operate at. When these bits are cleared, the L2 clock is stopped and the on-chip DLL for the L2 interface is disabled. For nonzero values, the processor generates the L2 clock and the on-chip DLL is enabled. After the L2 clock ratio is chosen, the DLL must stabilize before the L2 interface can be enabled. (See the hardware specifications). The resulting L2 clock frequency cannot be slower than the clock frequency of the 60x bus interface. 000 L2 clock and DLL disabled 001 /1 010 /1.5 011 Reserved 100 /2 101 /2.5 110 /3 111 Reserved L2 RAM type--Configures the L2 RAM interface for the type of synchronous SRAMs used: * Flow-through (register-buffer) synchronous burst SRAMs that clock addresses in and flow data out * Pipelined (register-register) synchronous burst SRAMs that clock addresses in and clock data out * Late-write synchronous SRAMs, for which the MPC750 requires a pipelined (register-register) configuration. Late-write RAMs require write data to be valid on the cycle after WE is asserted, rather than on the same cycle as the write enable as with traditional burst RAMs. For burst RAM selections, the MPC750 does not burst data into the L2 cache; it generates an address for each access. Pipelined SRAMs may be used for all L2 clock modes. Note that flow-through SRAMs can be used only for L2 clock modes divide-by-2 or slower (divide-by-1 and divide-by-1.5 not allowed). 00 Flow-through (register-buffer) synchronous burst SRAM 01 Reserved 10 Pipelined (register-register) synchronous burst SRAM 11 Pipelined (register-register) synchronous late-write SRAM L2 data-only. Setting this bit enables data-only operation in the L2 cache. For this operation, only transactions from the L1 data cache can be cached in the L2 cache, which treats all transactions from the L1 instruction cache as cache-inhibited (bypass L2 cache, no L2 checking done). This bit is provided for L2 testing only. L2 global invalidate. Setting L2I invalidates the L2 cache globally by clearing the L2 bits including status bits. This bit must not be set while the L2 cache is enabled. L2 RAM control (ZZ enable). Setting L2CTL enables the automatic operation of the L2ZZ (low-power mode) signal for cache RAMs that support the ZZ function. While L2CTL is asserted, L2ZZ asserts automatically when the MPC750 enters nap or sleep mode and negates automatically when the MPC750 exits nap or sleep mode. This bit should not be set when the MPC750 is in nap mode and snooping is to be performed through deassertion of QACK.
4-6
L2CLK
7-8
L2RAM
9
L2DO
10 11
L2I L2CTL
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-5
L2 Cache Interface Overview
Table 9-1. L2 Cache Control Register (continued)
Bits 12 Name L2WT Function L2 write-through. Setting L2WT selects write-through mode (rather than the default write-back mode) so all writes to the L2 cache also write through to the 60x bus. For these writes, the L2 cache entry is always marked as clean (valid unmodified) rather than dirty (valid modified). This bit must never be asserted after the L2 cache has been enabled as previously-modified lines can get remarked as clean during normal operation. L2 test support. Setting L2TS causes cache block pushes from the L1 data cache that result from dcbf and dcbst instructions to be written only into the L2 cache and marked valid, rather than being written only to the 60x bus and marked invalid in the L2 cache in case of hit. This bit allows a dcbz/dcbf instruction sequence to be used with the L1 cache enabled to easily initialize the L2 cache with any address and data information. This bit also keeps dcbz instructions from being broadcast on the 60x and single-beat cacheable store misses in the L2 from being written to the 60x bus. L2 output hold. These bits configure output hold time for address, data, and control signals driven by the MPC750 to the L2 data RAMs. They should generally be set according to the SRAM's input hold time requirements, for which late-write SRAMs usually differ from flow-through or burst SRAMs. 00 0.5 nS 01 1.0 nS 1x Reserved L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM interface is operated below 150 MHz. L2 differential clock. Setting L2DF configures the two clock-out signals (L2CLK_OUTA and L2CLK_OUTB) of the L2 interface to operate as one differential clock. In this mode, the B clock is driven as the logical complement of the A clock. This mode supports the differential clock requirements of late-write SRAMs. Generally, this bit should be set when late-write SRAMs are used. L2 DLL bypass. The DLL unit receives three input clocks: * A square-wave clock from the PLL unit to phase adjust and export * A non-square-wave clock for the internal phase reference * A feedback clock (L2SYNC_IN) for the external phase reference. Asserting L2BYP causes clock #2 to be used as clocks #1 and #2. (Clock #2 is the actual clock used by the registers of the L2 interface circuitry.) L2BYP is intended for use when the PLL is being bypassed, and for engineering evaluation. If the PLL is being bypassed, the DLL must be operated in divide-by-1 mode, and SYSCLK must be fast enough for the DLL to support. Reserved. These bits are implemented but not used; keep at 0 for future compatibility. L2 global invalidate in progress (read only). This read-only bit indicates whether an L2 global invalidate is occurring. It should be monitored after an L2 global invalidate has been initiated by the L2I bit to determine when it has completed.
13
L2TS
14-15
L2OH
16
L2SL
17
L2DF
18
L2BYP
19-30 31
-- L2IP
9.1.4
L2 Cache Initialization
Following a power-on or hard reset, the L2 cache and the L2 DLL are disabled initially. Before enabling the L2 cache, the L2 DLL must first be configured through the L2CR register, and the DLL must be allowed 640 L2 clock periods to achieve phase lock. Before enabling the L2 cache, other configuration parameters must be set in the L2CR, and the L2
9-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
L2 Cache Interface Overview
tags must be globally invalidated. The L2 cache should be initialized during system start-up. The sequence for initializing the L2 cache is as follows: * * * Power-on reset (automatically performed by the assertion of HRESET signal). Disable L2 cache by clearing L2 CR[L2E]. Set the L2CR[L2CLK] bits to the desired clock divider setting. Setting a nonzero value automatically enables the DLL. All other L2 cache configuration bits should be set to properly configure the L2 cache interface for the SRAM type, size, and interface timing required. Wait for the L2 DLL to achieve phase lock. This can be timed by setting the decrementer for a time period equal to 640 L2 clocks, or by performing an L2 global invalidate. Perform an L2 global invalidate. The global invalidate could be performed before enabling the DLL, or in parallel with waiting for the DLL to stabilize. Refer to Section 9.1.5, "L2 Cache Global Invalidation," for more information about L2 cache global invalidation. Note that a global invalidate always takes much longer than it takes for the DLL to stabilize. After the DLL stabilizes, an L2 global invalidate has been performed, and the other L2 configuration bits have been set, enable the L2 cache for normal operation by setting the L2CR[L2E] bit to 1.
*
*
*
9.1.5
L2 Cache Global Invalidation
The L2 cache supports a global invalidation function in which all bits of the L2 tags (tag data bits, tag status bits, and LRU bit) are cleared. It is performed by an on-chip hardware state machine that sequentially cycles through the L2 tags. The global invalidation function is controlled through L2CR[L2I], and it must be performed only while the L2 cache is disabled. The MPC750 can continue operation during a global invalidation provided the L2 cache has been properly disabled before the global invalidation operation starts. Note that the MPC750 must be operating at full power (low power modes disabled) in order to perform L2 cache invalidation. The sequence for performing a global invalidation of the L2 cache is as follows: * * Clear HID0[DPM] bit to zero. Dynamic power management must be disabled. Execute a sync instruction to finish any pending store operations in the load/store unit, disable the L2 cache by clearing L2CR[L2E], and execute an additional sync instruction after disabling the L2 cache to ensure that any pending operations in the L2 cache unit have completed. Initiate the global invalidation operation by setting the L2CR[L2I] bit to 1.
*
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-7
L2 Cache Interface Overview
*
*
Monitor the L2CR[L2IP] bit to determine when the global invalidation operation is completed (indicated by the clearing of L2CR[L2IP]). The global invalidation requires approximately 32K core clock cycles to complete. After detecting the clearing of L2CR[L2IP], clear L2CR[L2I] and re-enable the L2 cache for normal operation by setting L2CR[L2E]. Also, dynamic power management can be enabled at this time.
If dynamic power management is enabled (HID0[DPM] = 1), a global invalidate of the L2 cache may not properly invalidate the L2 tag memory during the time that the L1 data cache is waiting for reload data to be received from system memory. During that time, circuity in the L1 data cache is stopped to conserve power, which inadvertently affects the state machine performing the L2 global invalidate operation. There are two ways to avoid this: * * Be sure DPM = 0 during an L2 cache global invalidation. Ensure that the processor is in a tight uninterruptable software loop monitoring the end of the global invalidate, so that an L1 data cache miss cannot occur that would initiate a reload from system memory during the global invalidate operation.
9.1.6
L2 Cache Test Features and Methods
In the course of system power-up, testing may be required to verify the proper operation of the L2 tag memory, external SRAM, and overall L2 cache system. The following sections describe the MPC750's features and methods for testing the L2 cache. The L2 cache address space should be marked as guarded (G = 1) so spurious load operations are not forwarded to the 60x bus interface before branch resolution during L2 cache testing.
9.1.6.1
L2CR Support for L2 Cache Testing
L2CR[DO] and L2CR[TS] support the testing of the L2 cache. L2CR[DO] prevents instructions from being cached in the L2. This allows the L1 instruction cache to remain enabled during the testing process without having L1 instruction misses affect the contents of the L2 cache and allows all L2 cache activity to be controlled by program-specified load and store operations. L2CR[TS] is used with the dcbf and dcbst instructions to push data into the L2 cache. When L2CR[TS] is set, and the L1 data cache is enabled, an instruction loop containing a dcbf instruction can be used to store any address or data pattern to the L2 cache. Additionally, 60x bus broadcasting is inhibited when a dcbz instruction is executed. This allows the use of a dcbz instruction to clear an L1 cache block, followed by a dcbf instruction to push the cache block into the L2 cache and invalidate the L1 cache block. When the L2 cache is enabled, cacheable single-beat read operations are allowed to hit in the L2 cache and cacheable write operations are allowed to modify the contents of the L2 cache when a hit occurs. Cacheable single-beat read and writes occur when address
9-8 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
L2 Cache Interface Overview
translation is disabled (invoking the use of the default WIMG bits (0b0011)), or when address translation is enabled and accesses are marked as cacheable through the page table entries or the BATs, and the L1 data cache is disabled or locked. When the L2 cache has been initialized and the L1 cache has been disabled or locked, load or store instructions then bypass the L1 cache and hit in the L2 cache directly. When L2CR[TS] is set, cacheable single-beat writes are inhibited from accessing the 60x bus interface after an L2 cache miss. During L2 cache testing, the performance monitor can be used to count L2 cache hits and misses, thereby providing a numerical signature for test routines and a way to verify proper L2 cache operation.
9.1.6.2
L2 Cache Testing
A typical test for verifying the proper operation of the MPC750's L2 cache memory (external SRAM and tag) would perform the following steps: * Initialize the L2 test sequence by disabling address translation to invoke the default WIMG setting (0b0011). Set L2CR[DO] and L2CR[TS] and perform a global invalidation of the L1 data cache and the L2 cache. The L1 instruction cache can remain enabled to improve execution efficiency. Test the L2 cache external SRAM by enabling the L1 data cache and executing a sequence of dcbz, stw, and dcbf instructions to initialize the L2 cache with a desired range of consecutive addresses and with cache data consisting of zeros. Once the L2 cache holds a sequential range of addresses, disable the L1 data cache and execute a series of single-beat load and store operations employing a variety of bit patterns to test for stuck bits and pattern sensitivities in the L2 cache SRAM. The performance monitor can be used to verify whether the number of L2 cache hits or misses corresponds to the tests performed. Test the L2 cache tag memory by enabling the L1 data cache and executing a sequence of dcbz, stw, and dcbf instructions to initialize the L2 cache with a wide range of addresses and cache data. Once the L2 cache is populated with a known range of addresses and data, disable the L1 data cache and execute a series of store operations to addresses not previously in the L2 cache. These store operations should miss in every case. Note that setting the L2CR[TS] inhibits L2 cache misses from being forwarded to the 60x bus interface, thereby avoiding the potential for bus errors due to addressing hardware or nonexistent memory. The L2 cache then can be further verified by reading the previously loaded addresses and observing whether all the tags hit, and that the associated data compares correctly. The performance monitor can also be used to verify whether the proper number of L2 cache hits and misses correspond to the test operations performed. The entire L2 cache can be tested by clearing L2CR[DO] and L2CR[TS], restoring the L1 and L2 caches to their normal operational state, and executing a comprehensive test program designed to exercise all the caches. The test program
*
*
*
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-9
L2 Cache Interface Overview
should include operations that cause L2 hit, reload, and castout activity that can be subsequently verified through the performance monitor.
9.1.7
L2 Clock Configuration
The MPC750 provides a programmable clock for the L2 external synchronous data RAM. The clock frequency for the external SRAM is provided by dividing the MPC750's internal clock by ratios of 1, 1.5, 2, 2.5, or 3, programmed through the L2CR[CLK] bits. The L2 clock is phase-adjusted to synchronize the clocking of the latches in the MPC750's L2 cache interface with the clocking of the external SRAM by means of an on-chip delay-locked loop (DLL). The ratio selected for the L2 clock is dependent on the frequency supported by the external SRAMs, the MPC750's internal frequency of operation, and the range of phase adjustment supported by the L2 DLL. Refer to the MPC750 hardware specifications for additional information about L2 clock configuration.
9.1.8
L2 Cache SRAM Timing Examples
This section describes the signal timing for the three types of SRAM (flow-through burst SRAM, pipelined burst SRAM, and late-write SRAM) supported by the MPC750's L2 cache interface. The timing diagrams illustrate the best case logical (ideal, non AC-timing accurate) interface operations. For proper interface operation, the designer must select SRAMs that support the signal sequencing illustrated in the timing diagrams. Designers should also note that during burst transfers into and out of the L2 cache SRAM array, an address is generated by the MPC750 for each data beat. The SRAM selected for a system design is usually a function of desired system performance, L2 bus frequency, and SRAM unit cost. The following sections describe the operation of the three SRAM types supported by the MPC750, and the design trade-offs associated with each.
9.1.8.1
Flow-Through Burst SRAM
Flow-through burst SRAMs operate by clocking in the address, and driving the data directly to the bus from the SRAM memory array. This behavior allows the flow-through burst SRAMs to provide initial read data one cycle sooner than pipelined burst SRAMs, but the flow-through burst SRAM frequencies available may only support the slowest L2 bus frequencies. The MPC750 supports flow-through burst SRAM at L2 clock ratios of /2, /2.5, and /3. Figure 9-2 shows a burst read-write-read memory access sequence when the L2 cache interface is configured with flow-through burst SRAM.
9-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
L2 Cache Interface Overview
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData R0 burst rd R1 R2 R0 R0 R1 R1 R3 R2 R2 Rxtr R3 R3 Rxtr Rxtr hiZ W4 W4 burst wr W5 W6 W4 W5 W5 W6 W7 W6 W7 R8 W7 hiZ burst rd R9 R10 R11 Rxtr R8 R8 R9 R10 R11 Rxtr R9 R10 R11 Rxtr
Note: Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read.
Figure 9-2. Burst Read-Write-Read L2 Cache Access (Flow-Through)
Figure 9-3 shows a burst read-modify-write memory access sequence when the L2 cache interface is configured with flow-through burst SRAM.
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData R0 burst rd R1 R2 R0 R0 R1 R1 R3 R2 R2 R4 R3 R3 burst rd R5 R6 R4 R4 R5 R5 R7 R6 R6 R8 R7 R7 rd modify wr Rxtr R8 R8 Rxtr Rxtr hiZ burst wr W9 W10 W11 W12 W13 W9 W10 W11 W12 W13 W9 W10 W11 W12 W13
Note: Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read.
Figure 9-3. Burst Read-Modify-Write L2 Cache Access (Flow-Through)
Figure 9-4 shows a burst read-write-write memory access sequence when the L2 cache interface is configured with flow-through burst SRAM.
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData aborted rd R0 R0 R0 hiZ R1 burst rd R2 R3 R1 R1 R2 R2 R4 R3 R3 Rxtr R4 R4 Rxtr Rxtr hiZ W5 W5 burst wr W6 W7 W5 W6 W6 W7 W8 W7 W8 burst wr W9 W10 W11 W12 W8 W9 W10 W11 W12
W9 W10 W11 W12
Note: Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read.
Figure 9-4. Burst Read-Write-Write L2 Cache Access (Flow-Through)
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-11
L2 Cache Interface Overview
9.1.8.2
Pipelined Burst SRAM
Pipelined burst SRAMs operate at higher frequencies than flow-through burst SRAMs by clocking the read data from the memory array into a buffer before driving the data onto the data bus. This causes initial read accesses by the pipelined burst SRAMs to occur one cycle later than flow-through burst SRAMs, but the L2 bus frequencies supported can be higher. Note that the MPC750's L2 cache interface requires the use of single-cycle deselect pipelined burst SRAM for proper operation. Figure 9-5 shows a burst read-write-read memory access sequence when the L2 cache interface is configured with pipelined burst SRAM.
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData R0 burst rd R1 R2 R0 R1 R3 R2 R1 Rxtr R3 R2 Rxtr R3 hiZ W4 W4 burst wr W5 W6 W4 W5 W5 W6 W7 W6 W7 R8 W7 burst rd R9 R10 R11 Rxtr R8 R9 R10 R11 Rxtr R9 R10 R11
Rdrv R0
hiZ Rdrv R8
Notes: Rdrv indicates where some burst RAMs may begin driving the data bus. Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read.
Figure 9-5. Burst Read-Write-Read L2 Cache Access (Pipelined)
Figure 9-6 shows a burst read-modify-write memory access sequence when the L2 cache interface is configured with pipelined burst SRAM.
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData R0 burst rd R1 R2 R0 R1 R3 R2 R1 R4 R3 R2 burst rd R5 R6 R4 R3 R5 R4 R7 R6 R5 R8 R7 R6 rd modify wr Rxtr R8 R7 Rxtr R8 hiZ burst wr W9 W10 W11 W12 W13 W9 W10 W11 W12 W13 W9 W10 W11 W12 W13
Rdrv R0
Notes: Rdrv indicates where some burst RAMs may begin driving the data bus. Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read.
Figure 9-6. Burst Read-Modify-Write L2 Cache Access (Pipelined)
Figure 9-7 shows a burst read-write-write memory access sequence when the L2 cache interface is configured with pipelined burst SRAM.
9-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
L2 Cache Interface Overview
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData aborted rd R0 R0 R1 burst rd R2 R3 R1 R2 R4 R3 R2 Rxtr R4 R3 Rxtr R4 hiZ W5 W5 burst wr W6 W7 W5 W6 W6 W7 W8 W7 W8 burst wr W9 W10 W11 W12 W8 W9 W10 W11 W12
Rdrv hiZ Rdrv R1
W9 W10 W11 W12
Notes: Rdrv indicates where some burst RAMs may begin driving the data bus. Rxtr indicates where an extra read cycle is signaled to keep the burst RAM driving the data bus for the last read.
Figure 9-7. Burst Read-Write-Write L2 Cache Access (Pipelined)
9.1.8.3
Late-Write SRAM
Late-write SRAMs offer improved performance when compared to pipelined burst SRAMs by not requiring an extra read cycle during read operations, and requiring one cycle less when transitioning from a read to write operation. Late-write SRAMs implement an internal write queue, allowing write data to be provided one cycle after the write operation is signaled on the address and control buses. In this way write operations are queued on the address and data bus in the same way as read operations, allowing transitions between read and write operations to occur more efficiently. Figure 9-8 shows a burst read-write-read memory access sequence when the L2 cache interface is configured with late-write SRAM.
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData R0 burst rd R1 R2 R0 R1 R0 R3 R2 R1 R3 R2 R3 hiZ W4 burst wr W5 W6 (WQ) W4 W4 W5 W7 W5 W6 R8 W6 W7 burst rd R9 R10 R11 R8 hiZ R9 R10 R11 R8 R9 R10 R11
Note: WQ is the last previous write that was queued in the late-write RAM.
Figure 9-8. Burst Read-Write-Read L2 Cache Access (Late-Write SRAM)
Figure 9-9 shows a burst read-modify-write memory access sequence when the L2 cache interface is configured with late-write SRAM.
MOTOROLA
Chapter 9. L2 Cache Interface Operation
9-13
L2 Cache Interface Overview
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData R0 burst rd R1 R2 R0 R1 R0 R3 R2 R1 R4 R3 R2 burst rd R5 R6 R4 R3 R5 R4 rd modify wr R7 R6 R5 R8 R7 R6 R8 R7 R8 hiZ burst wr W9 W10 W11 W12 W13 (WQ) W9 W10 W11 W12 W9 W10 W11 W12 W13
Note: WQ is the last previous write that was queued in the late-write RAM.
Figure 9-9. Burst Read-Modify-Write L2 Cache Access (Late-Write SRAM)
Figure 9-10 shows a burst read-write-write memory access sequence when the L2 cache interface is configured with late-write SRAM.
SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData aborted rd R0 R0 R0 R1 burst rd R2 R3 R1 hiZ R2 R1 R4 R3 R2 R4 R3 R4 hiZ W5 burst wr W6 W7 (WQ) W5 W5 W6 W8 W6 W7 burst wr W9 W10 W11 W12 W7 W8 W8 W9 W10 W11 W12
W9 W10 W11 W12
Note: WQ is the last previous write that was queued in the late-write RAM.
Figure 9-10. Burst Read-Write-Write L2 Cache Access (Late-Write SRAM)
9-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 10 Power and Thermal Management
The MPC750 microprocessor is specifically designed for low-power operation. It provides both automatic and program-controlled power reduction modes for progressive reduction of power consumption. It also provides a thermal assist unit (TAU) to allow on-chip thermal measurement, allowing sophisticated thermal management for high-performance portable systems. This chapter describes the hardware support provided by the MPC750 for power and thermal management. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor."
10.1 Dynamic Power Management
Dynamic power management (DPM) automatically powers up and down the individual execution units of the MPC750, based upon the contents of the instruction stream. For example, if no floating-point instructions are being executed, the floating-point unit is automatically powered down. Power is not actually removed from the execution unit; instead, each execution unit has an independent clock input, which is automatically controlled on a clock-by-clock basis. Since CMOS circuits consume negligible power when they are not switching, stopping the clock to an execution unit effectively eliminates its power consumption. The operation of DPM is completely transparent to software or any external hardware. Dynamic power management is enabled by setting HID0[DPM] to 1.
10.2 Programmable Power Modes
The MPC750 provides four programmable power states--full power, doze, nap, and sleep. Software selects these modes by setting one (and only one) of the three power saving mode bits in the HID0 register. Hardware can enable a power management state through external asynchronous interrupts. Such a hardware interrupt causes the transfer of program flow to interrupt handler code that then invokes the appropriate power saving mode. The MPC750 provides a separate interrupt and interrupt vector for power management--the system management interrupt (SMI). The MPC750 also contains a decrementer which allows it to enter the nap or doze mode for a predetermined amount of time and then return to full power operation through a decrementer interrupt. Note that the MPC750 cannot switch from one power management mode to another without first returning to full-power mode. The sleep mode disables bus snooping; therefore, a hardware handshake is provided to ensure
MOTOROLA Chapter 10. Power and Thermal Management 10-1
Programmable Power Modes
coherency before the MPC750 enters this power management mode. Table 10-1 summarizes the four power states.
Table 10-1. MPC750 Microprocessor Programmable Power Modes
PM Mode Full power Full power (with DPM) Doze Functioning Units All units active -- Activation Method -- -- External asynchronous exceptions* Decrementer interrupt Performance monitor interrupt Thermal management interrupt Reset External asynchronous exceptions Decrementer interrupt Performance monitor interrupt Thermal management interrupt Reset External asynchronous exceptions Performance monitor interrupt Thermal management interrupt Reset Full-Power Wake Up Method
Requested logic by demand By instruction dispatch * Bus snooping * Data cache as needed * Decrementer timer Controlled by SW
Nap
* Bus snooping -- enabled by deassertion of QACK * Decrementer timer None
Controlled by hardware and software
Sleep
Controlled by hardware and software
Note: * Exceptions are referred to as interrupts in the architecture specification.
10.2.1 Power Management Modes
The following sections describe the characteristics of the MPC750's power management modes, the requirements for entering and exiting the various modes, and the system capabilities provided by the MPC750 while the power management modes are active.
10.2.1.1 Full-Power Mode with DPM Disabled
Full-power mode with DPM disabled is selected when the DPM enable bit (bit 11) in HID0 is cleared. * * Default state following power-up and HRESET All functional units are operating at full processor speed at all times.
10.2.1.2 Full-Power Mode with DPM Enabled
Full-power mode with DPM enabled (HID0[DPM] = 1) provides on-chip power management without affecting the functionality or performance of the MPC750. * * * *
10-2
Required functional units are operating at full processor speed. Functional units are clocked only when needed. No software or hardware intervention is required after mode is set. Software/hardware and performance transparent
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Programmable Power Modes
10.2.1.3 Doze Mode
Doze mode disables most functional units but maintains cache coherency by enabling the bus interface unit and snooping. A snoop hit causes the MPC750 to enable the data cache, copy the data back to memory, disable the cache, and fully return to the doze state. * * * Most functional units disabled Bus snooping and time base/decrementer still enabled Doze mode sequence -- Set doze bit (HID0[8] = 1), clear nap and sleep bits (HID0[9] and HID0[10] = 0) -- MPC750 enters doze mode after several processor clocks Several methods of returning to full-power mode -- Assert INT, SMI, MCP, decrementer, performance monitor, or thermal management interrupts -- Assert hard reset or soft reset Transition to full-power state takes no more than a few processor cycles PLL running and locked to SYSCLK
*
* *
10.2.1.4 Nap Mode
The nap mode disables the MPC750 but still maintains the phase-locked loop (PLL), delay locked loop (DLL), L2CLK_OUTA and L2CLK_OUTB output signals, and the time base/decrementer. The time base can be used to restore the MPC750 to the full-power state after a programmed amount of time. To maintain data coherency, bus snooping is disabled for nap and sleep modes through a hardware handshake sequence using the quiesce request (QREQ) and quiesce acknowledge (QACK) signals. The MPC750 asserts the QREQ signal to indicate that it is ready to disable bus snooping. When the system has ensured that snooping is no longer necessary, it will assert QACK and the MPC750 will enter the nap mode. If the system determines that a bus snoop cycle is required, QACK is negated to the MPC750 for at least eight bus clock cycles, and the MPC750 will then be able respond to a snoop cycle. Assertion of QACK following the snoop cycle will again disable the MPC750's snoop capability. The MPC750's power dissipation while in nap mode with QACK negated is the same as the power dissipation while in doze mode. Note that when in nap mode the DLL should be kept locked to enable a quick recovery to full-power mode without having to wait for the DLL to re-lock. Additionally, an L2ZZ signal is provided by the MPC750's L2 cache interface to drive external SRAM into a low power mode when the nap or sleep modes are invoked. The L2ZZ signal is enabled by setting the L2CR[CTL] bit to 1. Note that if bus snooping is to be performed through negation of the QACK signal, the L2CR[CTL] bit should always be cleared to 0. * * Time base/decrementer still enabled Most functional units disabled
Chapter 10. Power and Thermal Management 10-3
MOTOROLA
Programmable Power Modes
* *
*
*
* *
All nonessential input receivers disabled Nap mode sequence -- Set nap bit (HID0[9] = 1), clear doze and sleep bits (HID0[8] and HID0[10] = 0) -- MPC750 asserts quiesce request (QREQ) signal -- System asserts quiesce acknowledge (QACK) signal -- MPC750 enters sleep mode after several processor clocks Nap mode bus snoop sequence -- System deasserts QACK signal for eight or more bus clock cycles -- MPC750 snoops address tenure(s) on bus -- System asserts QACK signal to restore full nap mode Several methods of returning to full-power mode -- Assert INT, SMI, MCP, decrementer, performance monitor, or thermal management interrupts -- Assert hard reset or soft reset Transition to full-power takes no more than a few processor cycles PLL and DLL running and locked to SYSCLK
10.2.1.5 Sleep Mode
Sleep mode consumes the least amount of power of the four modes since all functional units are disabled. To conserve the maximum amount of power, the PLL may be disabled by placing the PLL_CFG signals in the PLL bypass mode, and disabling SYSCLK. Note that forcing the SYSCLK signal into a static state does not disable the MPC750's PLL, which will continue to operate internally at an undefined frequency unless placed in PLL bypass mode. Additionally, if the PLL is not disabled, the L2 cache interface DLL will remain locked and the L2CLK_OUTA and L2CLK_OUTB signals will remain active. The DLL is disabled by clearing the L2CR[L2E] bit to 0. Due to the fully static design of the MPC750, internal processor state is preserved when no internal clock is present. Because the time base and decrementer are disabled while the MPC750 is in sleep mode, the MPC750's time base contents will have to be updated from an external time base after exiting sleep mode if maintaining an accurate time-of-day is required. Before entering the sleep mode, the MPC750 asserts the QREQ signal to indicate that it is ready to disable bus snooping. When the system has ensured that snooping is no longer necessary, it asserts QACK and the MPC750 will enter sleep mode. * * All functional units disabled (including bus snooping and time base) All nonessential input receivers disabled -- Internal clock regenerators disabled -- PLL and DLL still running (see below)
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
10-4
Thermal Assist Unit
*
*
* *
Sleep mode sequence -- Set sleep bit (HID0[10] = 1), clear doze and nap bits (HID0[8] and HID0[9]) -- MPC750 asserts quiesce request (QREQ) -- System asserts quiesce acknowledge (QACK) -- MPC750 enters sleep mode after several processor clocks Several methods of returning to full-power mode -- Assert INT, SMI, or MCP interrupts -- Assert hard reset or soft reset PLL and DLL may be disabled and SYSCLK may be removed while in sleep mode Return to full-power mode after PLL and SYSCLK are disabled in sleep mode -- Enable SYSCLK -- Reconfigure PLL into desired processor clock mode -- System logic waits for PLL startup and relock time (100 sec) -- System logic asserts one of the sleep recovery signals (for example, INT or SMI) -- Reconfigure DLL, wait for DLL relock (640 L2 clock cycles) and re-enable L2 cache through the L2CR
10.2.2 Power Management Software Considerations
Since the MPC750 is a dual-issue processor with out-of-order execution capability, care must be taken in how the power management mode is entered. Furthermore, nap and sleep modes require all outstanding bus operations to be completed before these power management modes are entered. Normally, during system configuration time, one of the power management modes would be selected by setting the appropriate HID0 mode bit. Later on, the power management mode is invoked by setting the MSR[POW] bit. To ensure a clean transition into and out of a power management mode, set the MSR[EE] bit to 1 and execute the following code sequence:
sync mtmsr[POW = 1] isync loop: b loop
10.3 Thermal Assist Unit
With the increasing power dissipation of high-performance processors and operating conditions that span a wider range of temperatures than desktop systems, thermal management becomes an essential part of system design to ensure reliable operation of portable systems. One key aspect of thermal management is ensuring that the junction
MOTOROLA
Chapter 10. Power and Thermal Management
10-5
Thermal Assist Unit
temperature of the microprocessor does not exceed the operating specification. While the case temperature can be measured with an external thermal sensor, the thermal constant from the junction to the case can be large, and accuracy can be a problem. This may lead to lower overall system performance due to the necessary compensation to alleviate measurement deficiencies. The MPC750 provides the system designer an efficient means of monitoring junction temperature through the incorporation of an on-chip thermal sensor and programmable control logic to enable a thermal management implementation tightly coupled to the processor for improved performance and reliability.
10.3.1 Thermal Assist Unit Overview
The on-chip thermal assist unit (TAU) is composed of a thermal sensor, a digital-to-analog converter (DAC), a comparator, control logic, and three dedicated SPRs. See Figure 10-1 for a block diagram of the TAU.
Thermal Sensor
Interrupt Control
DAC
Thermal Interrupt Request (0x1700)
Decoder Latch
Thermal Sensor Control Logic
THRM1
THRM2
Figure 10-1. Thermal Assist Unit Block Diagram
The TAU provides thermal control by periodically comparing the MPC750's junction temperature against user-programmed thresholds, and generating a thermal management interrupt if the threshold values are crossed. The TAU also enables the user to determine the junction temperature through a software successive approximation routine. The TAU is controlled through three supervisor-level SPRs, accessed through the mtspr/mfspr instructions. Two of the SPRs (THRM1 and THRM2) provide temperature threshold values that can be compared to the junction temperature value, and control bits that enable comparison and thermal interrupt generation. The third SPR (THRM3) provides a TAU enable bit and a sample interval timer. Note that all the bits in THRM1, THRM2, and THRM3 are cleared to 0 during a hard reset, and the TAU remains idle and in a low-power state until configured and enabled.
10-6 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
THRM3
Thermal Assist Unit
The bit fields in the THRM1 and THRM2 SPRs are described in Table 10-2.
Table 10-2. THRM1 and THRM2 Bit Field Settings
Bits 0 Field TIN Description Thermal management interrupt bit. Read only. This bit is set if the thermal sensor output crosses the threshold specified in the SPR. The state of this bit is valid only if TIV is set. The interpretation of the TIN bit is controlled by the TID bit. Thermal management interrupt valid. Read only. This bit is set by the thermal assist logic to indicate that the thermal management interrupt (TIN) state is valid.
1 2-8
TIV
Threshold Threshold value that the output of the thermal sensor is compared to. The threshold range is between 0 and 127 C, and each bit represents 1 C. Note that this is not the resolution of the thermal sensor. -- TID Reserved. System software should clear these bits to 0. Thermal management interrupt direction bit. Selects the result of the temperature comparison to set TIN. If TID is cleared to 0, TIN is set and an interrupt occurs if the junction temperature exceeds the threshold. If TID is set to 1, TIN is set and an interrupt is indicated if the junction temperature is below the threshold. Thermal management interrupt enable. Enables assertion of the thermal management interrupt signal. The thermal management interrupt is maskable by the MSR[EE] bit. If TIE is cleared to 0 and THRMn is valid, the TIN bit records the status of the junction temperature vs. threshold comparison without asserting an interrupt signal. This feature allows system software to make a successive approximation to estimate the junction temperature. SPR valid bit. This bit is set to indicate that the SPR contains a valid threshold, TID, and TIE controls bits. Setting THRM1/2[V] and THRM3[E] to 1 enables operation of the thermal sensor.
9-28 29
30
TIE
31
V
The bit fields in the THRM3 SPR are described in Table 10-3.
Table 10-3. THRM3 Bit Field Settings
Bits 0-17 18-30 Name -- SITV Description Reserved for future use. System software should clear these bits to 0. Sample interval timer value. Number of elapsed processor clock cycles before a junction temperature vs. threshold comparison result is sampled for TIN bit setting and interrupt generation. This is necessary due to the thermal sensor, DAC, and the analog comparator settling time being greater than the processor cycle time. The value should be configured to allow a sampling interval of 20 microseconds. Enables the thermal sensor compare operation if either THRM1[V] or THRM2[V] is set to 1.
31
E
10.3.2 Thermal Assist Unit Operation
The TAU can be programmed to operate in single or dual threshold modes, which results in the TAU generating a thermal management interrupt when one or both threshold values are crossed. In addition, with the appropriate software routine, the TAU can also directly determine the junction temperature. The following sections describe the configuration of the TAU to support these modes of operation.
MOTOROLA
Chapter 10. Power and Thermal Management
10-7
Thermal Assist Unit
10.3.2.1 TAU Single Threshold Mode
When the TAU is configured for single threshold mode, either THRM1 or THRM2 can be used to contain the threshold value, and a thermal management interrupt is generated when the threshold value is crossed. To configure the TAU for single threshold operation, set the desired temperature threshold, TID, TIE, and V bits for either THRM1 or THRM2. The unused THRMn threshold SPR should be disabled by clearing the V bit to 0. In this discussion THRMn refers to the THRM threshold SPR (THRM1 or THRM2) selected to contain the active threshold value. After setting the desired operational parameters, the TAU is enabled by setting the THRM3[E] bit to 1, and placing a value allowing a sample interval of 20 microseconds or greater in the THRM3[SITV] field. The THRM3[SITV] setting determines the number of processor clock cycles between input to the DAC and sampling of the comparator output; accordingly, the use of a value smaller than recommended in the THRM3[SITV] field can cause inaccuracies in the sensed temperature. If the junction temperature does not cross the programmed threshold, the THRMn[TIN] bit is cleared to 0 to indicate that no interrupt is required, and the THRMn[TIV] bit is set to 1 to indicate that the TIN bit state is valid. If the threshold value has been crossed, the THRMn[TIN] and THRMn[TIV] bits are set to 1, and a thermal management interrupt is generated if both the THRMn[TIE] and MSR[EE] bits are set to 1. A thermal management interrupt is held asserted internally until recognized by the MPC750's interrupt unit. Once a thermal management interrupt is recognized, further temperature sampling is suspended, and the THRMn[TIN] and THRMn[TIV] values are held until an mtspr instruction is executed to THRMn. The execution of an mtspr instruction to THRMn anytime during TAU operation will clear the THRMn[TIV] bit to 0 and restart the temperature comparison. Executing an mtspr instruction to THRM3 will clear both THRM1[TIV] and THRM2[TIV] bits to 0, and restart temperature comparison in THRMn if the THRM3[E] bit is set to 1. Examples of valid THRM1 and THRM2 bit settings are shown in Table 10-4.
Table 10-4. Valid THRM1 and THRM2 Bit Settings
TIN1 x x x x x TIV1 x x x x x TID x x 0 0 1 TIE x 0 0 1 0 V 0 1 1 1 1 Description The threshold in the SPR will not be used for comparison. Threshold is used for comparison, thermal management interrupt assertion is disabled. Set TIN and do not assert thermal management interrupt if the junction temperature exceeds the threshold. Set TIN and assert thermal management interrupt if the junction temperature exceeds the threshold. Set TIN and do not assert thermal management interrupt if the junction temperature is less than the threshold.
10-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Thermal Assist Unit
Table 10-4. Valid THRM1 and THRM2 Bit Settings (continued)
TIN1 x x 0 1 0 1 TIV1 x 0 1 1 1 1 TID 1 x 0 0 1 1 TIE 1 x x x x x V 1 1 1 1 1 1 Description Set TIN and assert thermal management interrupt if the junction temperature is less than the threshold. The state of the TIN bit is not valid. The junction temperature is less than the threshold and as a result the thermal management interrupt is not generated for TIE = 1. The junction temperature is greater than the threshold and as a result the thermal management interrupt is generated if TIE = 1. The junction temperature is greater than the threshold and as a result the thermal management interrupt is not generated for TIE = 1. The junction temperature is less than the threshold and as a result the thermal management interrupt is generated if TIE = 1.
Note: 1The TIN and TIV bits are read-only status bits.
10.3.2.2 TAU Dual-Threshold Mode
The configuration and operation of the TAU's dual-threshold mode is similar to single threshold mode, except both THRM1 and THRM2 are configured with desired threshold and TID values, and the TIE and V bits are set to 1. When the THRM3[E] bit is set to 1 to enable temperature measurement and comparison, the first comparison is made with THRM1. If no thermal management interrupt results from the comparison, the number of processor cycles specified in THRM3[SITV] elapses, and the next comparison is made with THRM2. If no thermal management interrupt results from the THRM2 comparison, the time specified by THRM3[SITV] again elapses, and the comparison returns to THRM1. This sequence of comparisons continues until a thermal management interrupt occurs, or the TAU is disabled. When a comparison results in an interrupt, the comparison with the threshold SPR causing the interrupt is halted, but comparisons continue with the other threshold SPR. Following a thermal management interrupt, the interrupt service routine must read both THRM1 and THRM2 to determine which threshold was crossed. Note that it is possible for both threshold values to have been crossed, in which case the TAU ceases making temperature comparisons until an mtspr instruction is executed to one or both of the threshold SPRs.
10.3.2.3 MPC750 Junction Temperature Determination
While the MPC750's TAU does not implement an analog-to-digital converter to enable the direct determination of the junction temperature, system software can execute a simple successive approximation routine to find the junction temperature. The TAU configuration used to approximate the junction temperature is the same required for single-threshold mode, except that the threshold SPR selected has its TIE bit cleared to 0 to disable thermal management interrupt generation. Once the TAU is enabled, the
MOTOROLA Chapter 10. Power and Thermal Management 10-9
Instruction Cache Throttling
successive approximation routine loads a threshold value into the active threshold SPR, and then continuously polls the threshold SPRs TIV bit until it is set to 1, indicating a valid TIN bit. The successive approximation routine can then evaluate the TIN bit value, and then increment or decrement the threshold value for another comparison. This process is continued until the junction temperature is determined.
10.3.2.4 Power Saving Modes and TAU Operation
The static power saving modes provided by the MPC750 (the nap, doze, and sleep modes) allow the temperature of the processor to be lowered quickly, and can be invoked through the use of the TAU and associated thermal management interrupt. The TAU remains operational in the nap and doze modes, and in sleep mode as long as the SYSCLK signal input remains active. If the SYSCLK signal is made static when sleep mode is invoked, the TAU is rendered inactive. If the MPC750 is entering sleep mode with SYSCLK disabled, the TAU should be configured to disable thermal management interrupts to avoid an unwanted thermal management interrupt when the SYSCLK input signal is restored.
10.4 Instruction Cache Throttling
The MPC750 provides an instruction cache throttling mechanism to effectively reduce the instruction execution rate without the complexity and overhead of dynamic clock control. Instruction cache throttling, when used in conjunction with the TAU and the dynamic power management capability of the MPC750, provides the system designer with a flexible means of controlling device temperature while allowing the processor to continue operating. The instruction cache throttling mechanism simply reduces the instruction forwarding rate from the instruction cache to the instruction dispatcher. Normally, the instruction cache forwards four instructions to the instruction dispatcher every clock cycle if all the instructions hit in the cache. For thermal management the MPC750 provides a supervisor-level instruction cache throttling control (ICTC) SPR. The instruction forwarding rate is reduced by writing a nonzero value into the ICTC[FI] field, and enabling instruction cache throttling by setting the ICTC[E] bit to 1. The overall junction temperature reduction results from dynamic power management reducing the power to the execution units while waiting for instructions to be forwarded from the instruction cache; thus, instruction cache throttling does not provide thermal reduction unless HID0[DPM] is set to 1. Note that during instruction cache throttling the configuration of the PLL and DLL remain unchanged. The bit field settings of the ICTC SPR are shown in Table 10-5.
10-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Cache Throttling
Table 10-5. ICTC Bit Field Settings
Bits 23-30 Name FI Description Instruction forwarding interval expressed in processor clocks. 0x00--0 clock cycle 0x01--1 clock cycle . . 0xFF--255 clock cycles Cache throttling enable 0 Disable instruction cache throttling. 1 Enable instruction cache throttling.
31
E
MOTOROLA
Chapter 10. Power and Thermal Management
10-11
Instruction Cache Throttling
10-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Chapter 11 Performance Monitor
This chapter describes the performance monitor of the MPC750. Note that the MPC755 microprocessor is a derivative of the MPC750 and all descriptions for the MPC750 apply for the MPC755 except as noted in Appendix C, "MPC755 Embedded G3 Microprocessor." The performance monitor facility provides the ability to monitor and count predefined events such as processor clocks, misses in the instruction cache, data cache, or L2 cache, types of instructions dispatched, mispredicted branches, and other occurrences. The count of such events (which may be an approximation) can be used to trigger the performance monitor exception. The performance monitor facility is not defined by the PowerPC architecture. The performance monitor can be used for the following: * To increase system performance with efficient software, especially in a multiprocessing system. Memory hierarchy behavior may be monitored and studied in order to develop algorithms that schedule tasks (and perhaps partition them) and that structure and distribute data optimally. To improve processor architecture, the detailed behavior of the MPC750's structure must be known and understood in many software environments. Some environments may not be easily characterized by a benchmark or trace. To help system developers bring up and debug their systems.
*
*
The performance monitor uses the following MPC750-specific special-purpose registers (SPRs): * The performance monitor counter registers (PMC1-PMC4) are used to record the number of times a certain event has occurred. UPMC1-UPMC4 provide user-level read access to these registers. The monitor mode control registers (MMCR0-MMCR1) are used to enable various performance monitor interrupt functions and select events to count. UMMCR0-UMMCR1 provide user-level read access to these registers. The sampled instruction address register (SIA) contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition. USIA provides user-level read access to the SIA.
Chapter 11. Performance Monitor 11-1
*
*
MOTOROLA
Performance Monitor Interrupt
Four 32-bit counters in the MPC750 count occurrences of software-selectable events. Two control registers (MMCR0 and MMCR1) are used to control performance monitor operation. The counters and the control registers are supervisor-level SPRs; however, in the MPC750, the contents of these registers can be read by user-level software using separate SPRs (UMMCR0 and UMMCR1). Control fields in the MMCR0 and MMCR1 select the events to be counted, can enable a counter overflow to initiate a performance monitor exception, and specify the conditions under which counting is enabled. As with other PowerPC exceptions, the performance monitor interrupt follows the normal PowerPC exception model with a defined exception vector offset (0x00F00). Its priority is below the external interrupt and above the decrementer interrupt.
11.1 Performance Monitor Interrupt
The performance monitor provides the ability to generate a performance monitor interrupt triggered by a counter overflow condition in one of the performance monitor counter registers (PMC1-PMC4), shown in Figure 11-3. A counter is considered to have overflowed when its most-significant bit is set. A performance monitor interrupt may also be caused by the flipping from 0 to 1 of certain bits in the time base register, which provides a way to generate a time reference-based interrupt. Although the interrupt signal condition may occur with MSR[EE] = 0, the actual exception cannot be taken until MSR[EE] = 1. As a result of a performance monitor exception being signaled, the action taken depends on the type of event that caused the condition, which are as follows: * Threshold-related events--When a threshold event signals a performance monitor exception, the addresses of the instruction that caused the counter to overflow is saved in the SIA register. Programmable events--To help track which part of the code was being executed when an exception was signaled, the address of the last completed instruction during that cycle is saved in the SIA.
*
Exception handling for the performance monitor interrupt exception is described in Section 4.5.13, "Performance Monitor Interrupt (0x00F00)."
11.2 Special-Purpose Registers Used by Performance Monitor
The performance monitor incorporates the SPRs listed in Table 11-1. All of these supervisor-level registers are accessed through mtspr and mfspr instructions. The following table shows more information about all performance monitor SPRs.
11-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Special-Purpose Registers Used by Performance Monitor
Table 11-1. Performance Monitor SPRs
SPR Number 952 953 954 955 956 957 958 936 937 938 939 940 941 942 spr[5-9] || spr[0-4] 0b11101 11000 0b11101 11001 0b11101 11010 0b11101 11011 0b11101 11100 0b11101 11101 0b11101 11110 0b11101 01000 0b11101 01001 0b11101 01010 0b11101 01011 0b11101 01100 0b11101 01101 0b11101 01110 Register Name MMCR0 PMC1 PMC2 SIA MMCR1 PMC3 PMC4 UMMCR0 UPMC1 UPMC2 USIA UMMCR1 UPMC3 UPMC4 Access Level Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor User (read only) User (read only) User (read only) User (read only) User (read only) User (read only) User (read only)
11.2.1 Performance Monitor Registers
This section describes the registers used by the performance monitor.
11.2.1.1 Monitor Mode Control Register 0 (MMCR0)
The monitor mode control register 0 (MMCR0), shown in Figure 11-1, is a 32-bit SPR provided to specify events to be counted and recorded. MMCR0 can be written to only in supervisor mode. User-level software can read the contents of MMCR0 by issuing an mfspr instruction to UMMCR0, described in Section 11.2.1.2, "User Monitor Mode Control Register 0 (UMMCR0)."
INTONBITTRANS RTCSELECT DISCOUNT ENINT PMC2INTCONTROL PMC1INTCONTROL PMCTRIGGER
DIS DP DU DMSDMR
THRESHOLD
PMC1SELECT
PMC2SELECT
0
1
2
3
4
56
7
8
9 10
15 16 17 18 19
25 26
31
Figure 11-1. Monitor Mode Control Register 0 (MMCR0)
This register must be cleared at power up. Reading this register does not change its contents. Table 11-2 describes the bits of the MMCR0 register.
MOTOROLA
Chapter 11. Performance Monitor
11-3
Special-Purpose Registers Used by Performance Monitor
Table 11-2. MMCR0 Bit Settings
Bits 0 Name DIS Description Disables counting unconditionally. 0 The values of the PMCn counters can be changed by hardware. 1 The values of the PMCn counters cannot be changed by hardware. Disables counting while in supervisor mode. 0 The PMCn counters can be changed by hardware. 1 If the processor is in supervisor mode (MSR[PR] is cleared), the counters are not changed by hardware. Disables counting while in user mode. 0 The PMCn counters can be changed by hardware. 1 If the processor is in user mode (MSR[PR] is set), the PMCn counters are not changed by hardware. Disables counting while MSR[PM] is set. 0 The PMCn counters can be changed by hardware. 1 If MSR[PM] is set, the PMCn counters are not changed by hardware. Disables counting while MSR[PM] is zero. 0 The PMCn counters can be changed by hardware. 1 If MSR[PM] is cleared, the PMCn counters are not changed by hardware. Enables performance monitor interrupt signaling. 0 Interrupt signaling is disabled. 1 Interrupt signaling is enabled. Cleared by hardware when a performance monitor interrupt is signaled. To re-enable these interrupt signals, software must set this bit after servicing the performance monitor interrupt. The IPL ROM code clears this bit before passing control to the operating system. Disables counting of PMCn when a performance monitor interrupt is signaled (that is, ((PMCnINTCONTROL = 1) & (PMCn[0] = 1) & (ENINT = 1)) or the occurrence of an enabled time base transition with ((INTONBITTRANS =1) & (ENINT = 1)). 0 Signaling a performance monitor interrupt does not affect counting status of PMCn. 1 The signaling of a performance monitor interrupt prevents changing of PMC1 counter. The PMCn counter does not change if PMC2COUNTCTL = 0. Because a time base signal could have occurred along with an enabled counter overflow condition, software should always reset INTONBITTRANS to zero, if the value in INTONBITTRANS was a one. 64-bit time base, bit selection enable 00 Pick bit 63 to count 01 Pick bit 55 to count 10 Pick bit 51 to count 11 Pick bit 47 to count Causes interrupt signaling on bit transition (identified in RTCSELECT) from off to on. 0 Do not allow interrupt signal on the transition of a chosen bit. 1 Signal interrupt on the transition of a chosen bit. Software is responsible for setting and clearing INTONBITTRANS. Threshold value. All 6 bits are supported by the MPC750; allowing threshold values from 0 to 63. The intent of the THRESHOLD support is to characterize L1 data cache misses.
1
DP
2
DU
3
DMS
4
DMR
5
ENINT
6
DISCOUNT
7-8
RTCSELECT
9
INTONBITTRANS
10-15
THRESHOLD
16
PMC1INTCONTROL Enables interrupt signaling due to PMC1 counter overflow. 0 Disable PMC1 interrupt signaling due to PMC1 counter overflow. 1 Enable PMC1 Interrupt signaling due to PMC1 counter overflow.
11-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Special-Purpose Registers Used by Performance Monitor
Table 11-2. MMCR0 Bit Settings (continued)
Bits 17 Name Description
PMCINTCONTROL Enable interrupt signaling due to any PMC2-PMC4 counter overflow. Overrides the setting of DISCOUNT. 0 Disable PMC2-PMC4 interrupt signaling due to PMC2-PMC4 counter overflow. 1 Enable PMC2-PMC4 interrupt signaling due to PMC2-PMC4 counter overflow. PMCTRIGGER Can be used to trigger counting of PMC2-PMC4 after PMC1 has overflowed or after a performance monitor interrupt is signaled. 0 Enable PMC2-PMC4 counting. 1 Disable PMC2-PMC4 counting until either PMC1[0] = 1 or a performance monitor interrupt is signaled. PMC1 input selector, 128 events selectable; 25 defined. See Table 11-5. PMC2 input selector, 64 events selectable; 21 defined. See Table 11-6.
18
19-25 26-31
PMC1SELECT PMC2SELECT
MMCR0 can be accessed with the mtspr and mfspr instructions using SPR 952.
11.2.1.2 User Monitor Mode Control Register 0 (UMMCR0)
The contents of MMCR0 are reflected to UMMCR0, which can be read by user-level software. UMMCR0 can be accessed with the mfspr instructions using SPR 936.
11.2.1.3 Monitor Mode Control Register 1 (MMCR1)
The monitor mode control register 1 (MMCR1) functions as an event selector for performance monitor counter registers 3 and 4 (PMC3 and PMC4). The MMCR1 register is shown in Figure 11-2.
Reserved PMC3SELECT PMC4SELECT 00 0000 0000 0000 0000 0000
0
45
9 10
31
Figure 11-2. Monitor Mode Control Register 1 (MMCR1)
Bit settings for MMCR1 are shown in Table 11-3. The corresponding events are described in Section 11.2.1.5, "Performance Monitor Counter Registers (PMC1-PMC4)."
Table 11-3. MMCR1 Bit Settings
Bits 0-4 5-9 Name PMC3SELECT PMC4SELECT Description PMC3 input selector. 32 events selectable. See Table 11-7 for defined selections. PMC4 input selector. 32 events selectable. See Table 11-8 for defined selections. Reserved
10-31 --
MMCR1 can be accessed with the mtspr and mfspr instructions using SPR 956. User-level software can read the contents of MMCR1 by issuing an mfspr instruction to UMMCR1, described in Section 11.2.1.4, "User Monitor Mode Control Register 1 (UMMCR1)."
MOTOROLA Chapter 11. Performance Monitor 11-5
Special-Purpose Registers Used by Performance Monitor
11.2.1.4 User Monitor Mode Control Register 1 (UMMCR1)
The contents of MMCR1 are reflected to UMMCR1, which can be read by user-level software. UMMCR1 can be accessed with the mfspr instructions using SPR 940.
11.2.1.5 Performance Monitor Counter Registers (PMC1-PMC4)
PMC1-PMC4, shown in Figure 11-3, are 32-bit counters that can be programmed to generate interrupt signals when they overflow.
OV 0 1
Counter Value
31
Figure 11-3. Performance Monitor Counter Registers (PMC1-PMC4)
The bits contained in the PMC registers are described in Table 11-4.
Table 11-4. PMCn Bit Settings
Bits 0 1-31 OV Counter value Name Description Overflow. When this bit is set, it indicates this counter has reached its maximum value. Indicates the number of occurrences of the specified event.
Counters overflow when the high-order bit (the sign bit) becomes set; that is, they reach the value 2147483648 (0x8000_0000). However, an interrupt is not signaled unless both MMCR0[ENINT] and either PMC1INTCONTROL or PMCINTCONTROL in the MMCR0 register are also set as appropriate. Note that the interrupts can be masked by clearing MSR[EE]; the interrupt signal condition may occur with MSR[EE] cleared, but the exception is not taken until MSR[EE] is set. Setting MMCR0[DISCOUNT] forces counters to stop counting when a counter interrupt occurs. Software is expected to use the mtspr instruction to explicitly set PMC to non-overflowed values. Setting an overflowed value may cause an erroneous exception. For example, if both MMCR0[ENINT] and either PMC1INTCONTROL or PMCINTCONTROL are set and the mtspr instruction loads an overflow value, an interrupt signal may be generated without an event counting having taken place. The event to be monitored can be chosen by setting MMCR0[19-31]. The selected events are counted beginning when MMCR0 is set until either MMCR0 is reset or a performance monitor interrupt is generated. Table 11-5 lists the selectable events and their encodings.
11-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Special-Purpose Registers Used by Performance Monitor
Table 11-5. PMC1 Events--MMCR0[19-25] Select Encodings
Encoding 000 0000 000 0001 000 0010 0000011 0000100 0000101 0000110 0000111 0001000 0001001 0001010 0001011 0001100 All others Register holds current value. Number of processor cycles Number of instructions that have completed. Does not include folded branches. Number of transitions from 0 to 1 of specified bits in time base lower register. Bits are specified through RTCSELECT, MMRC0[7-8]. 00 = 15, 01 = 19, 10 = 23, 11 = 31 Number of instructions dispatched--0, 1, or 2 instructions per cycle Number of eieio instructions completed Number of cycles spent performing table search operations for the ITLB Number of accesses that hit the L2 Number of valid instruction EAs delivered to the memory subsystem Number of times the address of an instruction being completed matches the address in the IABR Number of loads that miss the L1 with latencies that exceeded the threshold value Number of branches that are unresolved when processed Number of cycles the dispatcher stalls due to a second unresolved branch in the instruction stream Reserved. May be used in a later revision. Description
Bits MMCR0[26-31] specify events associated with PMC2, as shown in Table 11-6.
Table 11-6. PMC2 Events--MMCR0[26-31] Select Encodings
Encoding 00 0000 00 0001 00 0010 00 0011 00 0100 00 0101 00 0110 00 0111 00 1000 00 1001 00 1010 00 1011 00 1100 00 1101 Register holds current value. Counts processor cycles. Counts completed instructions. Does not include folded branches. Counts transitions from 0 to 1 of TBL bits specified through MMRC0[RTCSELECT]. 00 = 47, 01 = 51, 10 = 55, 11 = 63. Counts instructions dispatched. 0, 1, or 2 instructions per cycle. Counts L1 instruction cache misses. Counts ITLB misses. Counts L2 instruction misses. Counts branches predicted or resolved not taken. Counts MSR[PR] bit toggles. Counts times reserved load operations completed. Counts completed load and store instructions. Counts snoops to the L1 and the L2. Counts L1 cast-outs to the L2. Description
MOTOROLA
Chapter 11. Performance Monitor
11-7
Special-Purpose Registers Used by Performance Monitor
Table 11-6. PMC2 Events--MMCR0[26-31] Select Encodings (continued)
Encoding 00 1110 00 1111 01 0000 All others Counts completed system unit instructions. Counts instruction fetch misses in the L1. Counts branches allowing out-of-order execution that resolved correctly. Reserved. Description
Bits MMCR1[0-4] specify events associated with PMC3, as shown in Table 11-7.
Table 11-7. PMC3 Events--MMCR1[0-4] Select Encodings
Encoding 0 0000 0 0001 0 0010 0 0011 0 0100 0 0101 0 0110 0 0111 0 1000 0 1001 0 1010 0 1011 0 1100 0 1101 0 1110 0 1111 1 0000 1 0001 All others Register holds current value. Number of processor cycles Number of completed instructions, not including folded branches. Number of TBL bit transitions from 0 to 1 of specified bits in time base lower register. Bits are specified through RTCSELECT (MMRC0[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 = 63. Number of instructions dispatched. 0, 1, or 2 per cycle. Number of L1 data cache misses Number of DTLB misses Number of L2 data misses Number of taken branches, including predicted branches. Number of transitions between marked and unmarked processes while in user mode. That is, the number of MSR[PM] toggles while the processor is in user mode. Number of store conditional instructions completed Number of instructions completed from the FPU Number of L2 castouts caused by snoops to modified lines Number of cache operations that hit in the L2 cache Reserved Number of cycles generated by L1 load misses Number of branches in the second speculative stream that resolve correctly Number of cycles the BPU stalls due to LR or CR unresolved dependencies Reserved. May be used in a later revision. Description
Bits MMCR1[5-9] specify events associated with PMC4, as shown in Table 11-8.
11-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Special-Purpose Registers Used by Performance Monitor
Table 11-8. PMC4 Events--MMCR1[5-9] Select Encodings
Encoding 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 All others Register holds current value Number of processor cycles Number of completed instructions, not including folded branches Number of TBL bit transitions from 0 to 1 of specified bits in time-base lower register. Bits are specified through RTCSELECT (MMRC0[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 = 63. Number of instructions dispatched. 0, 1, or 2 per cycle Number of L2 castouts Number of cycles spent performing table searches for DTLB accesses. Reserved. May be used in a later revision. Number of mispredicted branches Number of transitions between marked and unmarked processes while in user mode. That is, the number of MSR[PM] toggles while the processor is in supervisor mode. Number of store conditional instructions completed with reservation intact Number of completed sync instructions Number of snoop request retries Number of completed integer operations Number of cycles the BPU cannot process new branches due to having two unresolved branches Reserved. May be used in a later revision. Comments
The PMC registers can be accessed with the mtspr and mfspr instructions using the following SPR numbers: * * * * PMC1 is SPR 953 PMC2 is SPR 954 PMC3 is SPR 957 PMC4 is SPR 958
11.2.1.6 User Performance Monitor Counter Registers (UPMC1-UPMC4)
The contents of the PMC1-PMC4 are reflected to UPMC1-UPMC4, which can be read by user-level software. The UPMC registers can be read with the mfspr instructions using the following SPR numbers: * * * * UPMC1 is SPR 937 UPMC2 is SPR 938 UPMC3 is SPR 941 UPMC4 is SPR 942
Chapter 11. Performance Monitor 11-9
MOTOROLA
Event Counting
11.2.1.7 Sampled Instruction Address Register (SIA)
The sampled instruction address register (SIA) is a supervisor-level register that contains the effective address of an instruction executing at or around the time that the processor signals the performance monitor interrupt condition. The SIA is shown in Figure 11-4.
Instruction Address
0 31
Figure 11-4. Sampled instruction Address Registers (SIA)
If the performance monitor interrupt is triggered by a threshold event, the SIA contains the address of the exact instruction (called the sampled instruction) that caused the counter to overflow. If the performance monitor interrupt was caused by something besides a threshold event, the SIA contains the address of the last instruction completed during that cycle. SIA can be accessed with the mtspr and mfspr instructions using SPR 955.
11.2.1.8 User Sampled Instruction Address Register (USIA)
The contents of SIA are reflected to USIA, which can be read by user-level software. USIA can be accessed with the mfspr instructions using SPR 939.
11.3 Event Counting
Counting can be enabled if conditions in the processor state match a software-specified condition. Because a software task scheduler may switch a processor's execution among multiple processes and because statistics on only a particular process may be of interest, a facility is provided to mark a process. The performance monitor (PM) bit, MSR[29] is used for this purpose. System software may set this bit when a marked process is running. This enables statistics to be gathered only during the execution of the marked process. The states of MSR[PR] and MSR[PM] together define a state that the processor (supervisor or program) and the process (marked or unmarked) may be in at any time. If this state matches a state specified by the MMCR, the state for which monitoring is enabled, counting is enabled. The following are states that can be monitored: * * * * * *
11-10
(Supervisor) only (User) only (Marked and user) only (Not marked and user) only (Marked and supervisor) only (Not marked and supervisor) only
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
Event Selection
* * * *
(Marked) only (Not marked) only Counting is unconditionally enabled regardless of the states of MSR[PM] and MSR[PR]. This can be accomplished by clearing MMCR0[0-4]. Counting is unconditionally disabled regardless of the states of MSR[PM] and MSR[PR]. This is done by setting MMCR0[0].
In addition, one of two unconditional counting modes may be specified:
The performance monitor counters count specified events and are used to generate performance monitor exceptions when an overflow (most-significant bit is a 1) situation occurs. The MPC750 performance monitor has four, 32-bit registers that can count up to 0x7FFFFFFF (2,147,483,648 in decimal) before overflowing. Bit 0 of the registers is used to determine when an interrupt condition exists.
11.4 Event Selection
Event selection is handled through MMCR0 and MMCR1, described in Table 11-2 and Table 11-3, respectively. Event selection is described as follows: * The four event-select fields in MMCR0 and MMCR1 are as follows: -- MMCR0[19-25] PMC1SELECT--PMC1 input selector, 128 events selectable; 25 defined. See Table 11-5. -- MMCR0[26-31] PMC2SELECT--PMC2 input selector, 64 events selectable; 21 defined. See Table 11-6. -- MMCR0[0-4] PMC3SELECT--PMC3 input selector. 32 events selectable, defined. See Table 11-7. -- MMCR0[5-9] PMC4SELECT--PMC4 input selector. 32 events selectable. See Table 11-8. In the tables, a correlation is established between each counter, events to be traced, and the pattern required for the desired selection. The first five events are common to all four counters and are considered to be reference events. These are as follows: -- 00000--Register holds current value -- 00001--Number of processor cycles -- 00010--Number of completed instructions, not including folded branches -- 00011--Number of TBL bit transitions from 0 to 1 of specified bits in time base lower register. Bits are specified through RTCSELECT (MMCR0[7-8]). 0 = 47, 1 = 51, 2 = 55, 3 = 63. -- 00100--Number of instructions dispatched. 0, 1, or 2 per cycle
* *
MOTOROLA
Chapter 11. Performance Monitor
11-11
Warnings
*
Some events can have multiple occurrences per cycle, and therefore need two or three bits to represent them.
11.5 Warnings
The following warnings should be noted: * * Only those load and store in queue position 0 of their respective load/store queues are monitored when a threshold event is selected in PMC1. The MPC750 cannot accurately track threshold events with respect to the following types of loads and stores: -- Unaligned load and store operations that cross a word boundary -- Load and store multiple operations -- Load and store string operations
11-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Appendix A PowerPC Instruction Set Listings
This appendix lists the MPC750 microprocessor's instruction set as well as the additional PowerPC instructions not implemented in the MPC750. Instructions are sorted by mnemonic, opcode, function, and form. Also included in this appendix is a quick reference table that contains general information, such as the architecture level, privilege level, and form, and indicates if the instruction is 64-bit and optional. Note that the MPC750 is a 32-bit microprocessor, and doesn't implement any 64-bit instructions. Note that split fields, that represent the concatenation of sequences from left to right, are shown in lowercase. For more information refer to Chapter 8, "Instruction Set," in the Programming Environments Manual.
A.1
Instructions Sorted by Mnemonic
Table A-1 lists the instructions implemented in the PowerPC architecture in alphabetical order by mnemonic.
Key: Reserved bits
Table A-1. Complete Instruction List Sorted by Mnemonic
Name addx addcx addex addi addic addic. addis addmex addzex andx andcx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 14 12 13 15 31 31 31 31
D D D D D D D D D S S
A A A A A A A A A A A
B B B
OE OE OE SIMM SIMM SIMM SIMM
266 10 138
Rc Rc Rc
00000 00000 B B
OE OE
234 202 28 60
Rc Rc Rc Rc
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-1
Instructions Sorted by Mnemonic
Name andi. andis. bx bcx bcctrx bclrx cmp cmpi cmpl cmpli cntlzwx crand crandc creqv crnand crnor cror crorc crxor dcba 1,7 dcbf dcbi 2 dcbst dcbt dcbtst dcbz divwx divwux eciwx ecowx eieio eqvx extsbx
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
28 29 18 16 19 19 31 11 31 10 31 19 19 19 19 19 19 19 19 31 31 31 31 31 31 31 31 31 31 31 31 31 31
S S
A A LI
UIMM UIMM AA LK BD 00000 00000 B SIMM B UIMM 00000 crbB crbB crbB crbB crbB crbB crbB crbB B B B B B B B B B B B 00000 B 00000 OE OE 26 257 129 289 225 33 449 417 193 758 86 470 54 278 246 1014 491 459 310 438 854 284 954 Rc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Rc Rc 0 0 0 Rc Rc 32 0 528 16 0 AA LK LK LK 0
BO BO BO crfD crfD crfD crfD S crbD crbD crbD crbD crbD crbD crbD crbD 00000 00000 00000 00000 00000 00000 00000 D D D S 00000 S S 0L 0L 0L 0L
BI BI BI A A A A A crbA crbA crbA crbA crbA crbA crbA crbA A A A A A A A A A A A 00000 A A
A-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Mnemonic
Name extshx fabsx faddx faddsx fcmpo fcmpu fctiwx fctiwzx fdivx fdivsx fmaddx fmaddsx fmrx fmsubx fmsubsx fmulx fmulsx fnabsx fnegx fnmaddx fnmaddsx fnmsubx fnmsubsx fresx 1 frspx frsqrtex 1 fselx 1 fsqrtx 1,7 fsqrtsx 1,7 fsubx fsubsx icbi isync
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 63 63 59 63 63 63 63 63 59 63 59 63 63 59 63 59 63 63 63 59 63 59 59 63 63 63 63 59 63 59 31 19 crfD crfD
S D D D 00 00 D D D D D D D D D D D D D D D D D D D D D D D D D 00000 00000
A 00000 A A A A 00000 00000 A A A A 00000 A A A A 00000 00000 A A A A 00000 00000 00000 A 00000 00000 A A A 00000
00000 B B B B B B B B B B B B B B 00000 00000 B B B B B B B B B B B B B B B 00000 00000 C 00000 00000 00000 00000 C C C C 00000 C C C C 00000 00000 C C 00000 00000
922 264 21 21 32 0 14 15 18 18 29 29 72 28 28 25 25 136 40 31 31 30 30 24 12 26 23 22 22 20 20 982 150
Rc Rc Rc Rc 0 0 Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc 0 0
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-3
Instructions Sorted by Mnemonic
Name lbz lbzu lbzux lbzx lfd lfdu lfdux lfdx lfs lfsu lfsux lfsx lha lhau lhaux lhax lhbrx lhz lhzu lhzux lhzx lmw 3 lswi 3 lswx 3 lwarx lwbrx lwz lwzu lwzux lwzx mcrf mcrfs mcrxr
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
34 35 31 31 50 51 31 31 48 49 31 31 42 43 31 31 31 40 41 31 31 46 31 31 31 31 32 33 31 31 19 63 31 crfD crfD crfD
D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D 00 00 00 crfS crfS
A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 00 00 B B 00000 00000 00000 NB B B B B B B B B B B B B B B
d d 119 87 d d 631 599 d d 567 535 d d 375 343 790 d d 311 279 d 597 533 20 534 d d 55 23 0 64 512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00000
A-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Mnemonic
Name mfcr mffsx mfmsr 2 mfspr 4 mfsr 2 mfsrin 2 mftb mtcrf mtfsb0x mtfsb1x mtfsfx mtfsfix mtmsr 2 mtspr 4 mtsr 2 mtsrin 2 mulhwx mulhwux mulli mullwx nandx negx norx orx orcx ori oris rfi 2 rlwimix rlwinmx rlwnmx sc slwx
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 63 31 31 31 31 31 31 63 63 63 63 31 31 31 31 31 31 7 31 31 31 31 31 31 24 25 19 20 21 23 17 31 0 crfD
D D D D D D D S crbD crbD
FM
00000 00000 00000 spr 0
SR
00000 00000 00000
19 583 83 339
0 Rc 0 0 0 0 0 0 Rc Rc Rc Rc 0 0 0 0 Rc Rc
00000 B tbr
595 659 371 0 144 70 38 711 0 134 146 467
00000
0 00000 00000
CRM
00000 00000 0 B
IMM
00 S S S S D D D D S D S S S S S 0
00000 00000 spr
SR
00000
00000 B B B 0 0 SIMM B B 00000 B B B UIMM UIMM 00000 SH SH B MB MB MB OE OE
210 242 75 11
00000 A A A A A A A A A A A 00000 A A A 00000 A
235 476 104 124 444 412
Rc Rc Rc Rc Rc Rc
00000 S S S 00000 S
50 ME ME ME
0 Rc Rc Rc 10 Rc
00000000000000 B 24
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-5
Instructions Sorted by Mnemonic
Name srawx srawix srwx stb stbu stbux stbx stfd stfdu stfdux stfdx stfiwx 1 stfs stfsu stfsux stfsx sth sthbrx sthu sthux sthx stmw 3 stswi 3 stswx 3 stw stwbrx stwcx. stwu stwux stwx subfx subfcx subfex
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 38 39 31 31 54 55 31 31 31 52 53 31 31 44 31 45 31 31 47 31 31 36 31 31 37 31 31 31 31 31
S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S D D D
A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
B SH B d d B B d d B B B d d B B d B d B B d NB B d B B d B B B B B OE OE OE
792 824 536
Rc Rc Rc
247 215
0 0
759 727 983
0 0 0
695 663
0 0
918
0
439 407
0 0
725 661
0 0
662 150
0 1
183 151 40 8 136
0 0 Rc Rc Rc
A-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Opcode
Name subfic subfmex subfzex sync tlbia 1,3,7 tlbie 1,3 tlbsync1,3 tw twi xorx xori xoris
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
08 31 31 31 31 31 31 31 03 31 26 27 Notes:
1 2 3 4
D D D 00000 00000 00000 00000 TO TO S S S
A A A 00000 00000 00000 00000 A A A A A B 00000 00000 00000 00000 B 00000 B OE OE
SIMM 232 200 598 370 306 566 4 SIMM 316 UIMM UIMM Rc Rc Rc 0 0 0 0 0
Optional instruction Supervisor-level instruction Load/store string/multiple instruction Supervisor- and user-level instruction
A.2
Instructions Sorted by Opcode
Table A-2 lists the instructions defined in the PowerPC architecture in numeric order by opcode
.
Key: Reserved bits
Table A-2. Complete Instruction List Sorted by Opcode
Name twi mulli subfic cmpli cmpi addic addic. addi
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
000011 000111 001000 001010 001011 001100 001101 001110
TO D D crfD crfD D D D 0L 0L
A A A A A A A A
SIMM SIMM SIMM UIMM SIMM SIMM SIMM SIMM
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-7
Instructions Sorted by Opcode
Name addis bcx sc bx mcrf bclrx crnor rfi 1 crandc isync crxor crnand crand creqv crorc cror bcctrx rlwimix rlwinmx rlwnmx ori oris xori xoris andi. andis. cmp tw subfcx addcx mulhwux mfcr lwarx
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
001111 010000 010001 010010 010011 010011 010011 010011 010011 010011 010011 010011 010011 010011 010011 010011 010011 010100 010101 010111 011000 011001 011010 011011 011100 011101 011111 011111 011111 011111 011111 011111 011111 crfD crfD
D BO 00000
A BI 00000 LI 00 crfS BI crbA 00000 crbA 00000 crbA crbA crbA crbA crbA crbA BI A A A A A A A A A 0L A A A A A 00000 A B B B B B 00000 B OE OE 0 00 00000 00000 crbB 00000 crbB 00000 crbB crbB crbB crbB crbB crbB 00000 SH SH B
SIMM BD 000000000000000 AA LK 10 AA LK 0000000000 0000010000 0000100001 0000110010 0010000001 0010010110 0011000001 0011100001 0100000001 0100100001 0110100001 0111000001 1000010000 MB MB MB UIMM UIMM UIMM UIMM UIMM UIMM 0000000000 0000000100 0000001000 0000001010 0000001011 0000010011 0000010100 0 0 Rc Rc Rc 0 0 ME ME ME 0 LK 0 0 0 0 0 0 0 0 0 0 LK Rc Rc Rc
BO crbD 00000 crbD 00000 crbD crbD crbD crbD crbD crbD BO S S S S S S S S S
TO D D D D D
A-8
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Opcode
Name lwzx slwx cntlzwx andx cmpl subfx dcbst lwzux andcx mulhwx mfmsr 1 dcbf lbzx negx lbzux norx subfex addex mtcrf mtmsr 1 stwcx. stwx stwux subfzex addzex mtsr 1 stbx subfmex addmex mullwx mtsrin 1 dcbtst stbux
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 crfD
D S S S 0L D 00000 D S D D 00000 D D D S D D S S S S S D D S S D D D S 00000 S 0 0
A A A A A A A A A A 00000 A A A A A A A
CRM
B B 00000 B B B B B B B 00000 B B 00000 B B B B 0 00000 B B B 00000 00000 00000 B 00000 00000 B B B B OE OE OE OE OE OE OE OE 0 OE
0000010111 0000011000 0000011010 0000011100 0000100000 0000101000 0000110110 0000110111 0000111100 0001001011 0001010011 0001010110 0001010111 0001101000 0001110111 0001111100 0010001000 0010001010 0010010000 0010010010 0010010110 0010010111 0010110111 0011001000 0011001010 0011010010 0011010111 0011101000 0011101010 0011101011 0011110010 0011110110 0011110111
0 Rc Rc Rc 0 Rc 0 0 Rc Rc 0 0 0 Rc 0 Rc Rc Rc 0 0 1 0 0 Rc Rc 0 0 Rc Rc Rc 0 0 0
00000 A A A A A
SR
A A A A 00000 A A
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-9
Instructions Sorted by Opcode
Name addx dcbt lhzx eqvx tlbie 1, 2 eciwx lhzux xorx mfspr 3 lhax tlbia 1, 2, 4 mftb lhaux sthx orcx ecowx sthux orx divwux mtspr 3 dcbi 1 nandx divwx mcrxr lswx 5 lwbrx lfsx srwx tlbsync 1, 2 lfsux mfsr 1 lswi 5 sync
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 crfD
D 00000 D S 00000 D D S D D 00000 D D S S S S S D S 00000 S D 00 D D D S 00000 D D D 00000 0
A A A A 00000 A A A spr A 00000 tbr A A A A A A A spr A A A 00000 A A A A 00000 A
SR
B B B B B B B B
OE
0100001010 0100010110 0100010111 0100011100 0100110010 0100110110 0100110111 0100111100 0101010011
Rc 0 0 Rc 0 0 0 Rc 0 0 0 0 0 0 Rc 0 0 Rc Rc 0 0 Rc Rc 0 0 0 0 Rc 0 0 0 0 0
B 00000
0101010111 0101110010 0101110011
B B B B B B B OE
0101110111 0110010111 0110011100 0110110110 0110110111 0110111100 0111001011 0111010011
B B B 00000 B B B B 00000 B 00000 NB 00000 OE
0111010110 0111011100 0111101011 1000000000 1000010101 1000010110 1000010111 1000011000 1000110110 1000110111 1001010011 1001010101 1001010110
A 00000
A-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Opcode
Name lfdx lfdux mfsrin 1 stswx 5 stwbrx stfsx stfsux stswi 5 stfdx dcba 2, 4 stfdux lhbrx srawx srawix eieio sthbrx extshx extsbx icbi stfiwx 2 dcbz lwz lwzu lbz lbzu stw stwu stb stbu lhz lhzu lha lhau
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 011111 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011
D D D S S S S S S 00000 S D S S 00000 S S S 00000 S 00000 D D D D S S S S D D D D
A A 00000 A A A A A A A A A A A 00000 A A A A A A A A A A A A A A A A A A
B B B B B B B NB B B B B B SH 00000 B 00000 00000 B B B
1001010111 1001110111 1010010011 1010010101 1010010110 1010010111 1010110111 1011010101 1011010111 1011110110 1011110111 1100010110 1100011000 1100111000 1101010110 1110010110 1110011010 1110111010 1111010110 1111010111 1111110110 d d d d d d d d d d d d
0 0 0 0 0 0 0 0 0 0 0 0 Rc Rc 0 0 Rc Rc 0 0 0
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-11
Instructions Sorted by Opcode
Name sth sthu lmw 5 stmw 5 lfs lfsu lfd lfdu stfs stfsu stfd stfdu fdivsx fsubsx faddsx fsqrtsx 2, 4 fresx 2 fmulsx fmsubsx fmaddsx fnmsubsx fnmaddsx fcmpu frspx fctiwx fctiwzx fdivx fsubx faddx fsqrtx 2, 4 fselx 2 fmulx fmsubx
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
101100 101101 101110 101111 110000 110001 110010 110011 110100 110101 110110 110111 111011 111011 111011 111011 111011 111011 111011 111011 111011 111011 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 crfD
S S D S D D D D S S S S D D D D D D D D D D 00 D D D D D D D D D D
A A A A A A A A A A A A A A A 00000 00000 A A A A A A 00000 00000 00000 A A A 00000 A A A B B B B B 00000 B B B B B B B B B B B B B 00000 B
d d d d d d d d d d d d 00000 00000 00000 00000 00000 C C C C C 10010 10100 10101 10110 11000 11001 11100 11101 11110 11111 Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc 0 Rc
0000000000 0000001100 0000001110 0000001111 00000 00000 00000 00000 C C C 10010 10100 10101 10110 10111 11001 11100
Rc Rc Rc Rc Rc Rc Rc Rc
A-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Grouped by Functional Categories
Name fmaddx fnmsubx fnmaddx fcmpo mtfsb1x fnegx mcrfs mtfsb0x fmrx mtfsfix fnabsx fabsx mffsx mtfsfx
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 Notes:
1Supervisor-level 2Optional 432-bit 3Supervisor5Load/store
D D D crfD crbD D crfD crbD D crfD D D D 0
FM
A A A 00 A 00000 00000 00 crfS 00
B B B B 00000 B 00000 00000 B
IMM
C C C
11101 11110 11111
Rc Rc Rc 0 Rc Rc 0 Rc Rc Rc Rc Rc Rc Rc
0000100000 0000100110 0000101000 0001000000 0001000110 0001001000 0 0010000110 0010001000 0100001000 1001000111 1011000111
00000 00000 00 00000 00000 00000 00000 0
B B 00000 B
instruction
instruction and user-level instruction
instruction not implemented by the MPC750 string/multiple instruction
A.3
Instructions Grouped by Functional Categories
Key: Reserved bits
Table A-3 through Table A-28. list the PowerPC instructions grouped by function.
Table A-3. Integer Arithmetic Instructions
Name addx addcx addex addi addic addic.
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 14 12 13
D D D D D D
A A A A A A
B B B
OE OE OE SIMM SIMM SIMM
266 10 138
Rc Rc Rc
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-13
Instructions Grouped by Functional Categories
addis addmex addzex divwx divwux mulhwx mulhwux mulli mullwx negx subfx subfcx subficx subfex subfmex subfzex 15 31 31 31 31 31 31 07 31 31 31 31 08 31 31 31 D D D D D D D D D D D D D D D D A A A A A A A A A A A A A A A A B 00000 00000 OE OE OE B 00000 B B OE OE OE OE SIMM 136 232 200 Rc Rc Rc 00000 00000 B B B B OE OE OE OE 0 0 SIMM 235 104 40 8 Rc Rc Rc Rc SIMM 234 202 491 459 75 11 Rc Rc Rc Rc Rc Rc
Table A-4. Integer Compare Instructions
Name cmp cmpi cmpl cmpli
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 11 31 10
crfD crfD crfD crfD
0L 0L 0L 0L
A A A A
B
0000000000 SIMM
0
B UIMM
32
0
Table A-5. Integer Logical Instructions
Name andx andcx andi. andis. cntlzwx eqvx extsbx extshx nandx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 28 29 31 31 31 31 31
S S S S S S S S S
A A A A A A A A A
B B UIMM UIMM 00000 B 00000 00000 B
28 60
Rc Rc
26 284 954 922 476
Rc Rc Rc Rc Rc
A-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Grouped by Functional Categories
norx orx orcx ori oris xorx xori xoris 31 31 31 24 25 31 26 27 S S S S S S S S A A A A A A A A B UIMM UIMM B B B UIMM UIMM 316 Rc 124 444 412 Rc Rc Rc
Table A-6. Integer Rotate Instructions
Name rlwimix rlwinmx rlwnmx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
22 20 21
S S S
A A A
SH SH SH
MB MB MB
ME ME ME
Rc Rc Rc
Table A-7. Integer Shift Instructions
Name slwx srawx srawix srwx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31
S S S S
A A A A
B B SH B
24 792 824 536
Rc Rc Rc Rc
Table A-8. Floating-Point Arithmetic Instructions
Name faddx faddsx fdivx fdivsx fmulx fmulsx fresx 1 frsqrtex 1 fsubx fsubsx fselx 1
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 59 63 59 63 59 59 63 63 59 63
D D D D D D D D D D D
A A A A A A 00000 00000 A A A
B B B B 00000 00000 B B B B B
00000 00000 00000 00000 C C 00000 00000 00000 00000 C
21 21 18 18 25 25 24 26 20 20 23
Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-15
Instructions Grouped by Functional Categories
fsqrtx 1, 2 fsqrtsx 1, 2 63 59 Notes:
1Optional 2
D D
00000 00000
B B
00000 00000
22 22
Rc Rc
instruction
32-bit instruction not implemented by the MPC750
Table A-9. Floating-Point Multiply-Add Instructions
Name fmaddx fmaddsx fmsubx fmsubsx fnmaddx fnmaddsx fnmsubx fnmsubsx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 59 63 59 63 59 63 59
D D D D D D D D
A A A A A A A A
B B B B B B B B
C C C C C C C C
29 29 28 28 31 31 30 30
Rc Rc Rc Rc Rc Rc Rc Rc
Table A-10. Floating-Point Rounding and Conversion Instructions
Name fctiwx fctiwzx frspx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 63 63
D D D
00000 00000 00000
B B B
14 15 12
Rc Rc Rc
Table A-11. Floating-Point Compare Instructions
Name fcmpo fcmpu
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 63
crfD crfD
00 00
A A
B B
32 0
0 0
Table A-12. Floating-Point Status and Control Register Instructions
Name mcrfs mffsx mtfsb0x mtfsb1x mtfsfx mtfsfix
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 63 63 63 31 63 0
crfD D crbD crbD
00
crfS
00
00000 00000 00000 00000
64 583 70 38 711 0 134
0 Rc Rc Rc Rc Rc
00000 00000 00000
FM
0 00000
B
IMM
crfD
00
A-16
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Grouped by Functional Categories
Table A-13. Integer Load Instructions
Name lbz lbzu lbzux lbzx lha lhau lhaux lhax lhz lhzu lhzux lhzx lwz lwzu lwzux lwzx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
34 35 31 31 42 43 31 31 40 41 31 31 32 33 31 31
D D D D D D D D D D D D D D D D
A A A A A A A A A A A A A A A A B B B B B B B B
d d 119 87 d d 375 343 d d 311 279 d d 55 23 0 0 0 0 0 0 0 0
Table A-14. Integer Store Instructions
Name stb stbu stbux stbx sth sthu sthux sthx stw stwu stwux stwx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
38 39 31 31 44 45 31 31 36 37 31 31
S S S S S S S S S S S S
A A A A A A A A A A A A B B B B B B
d d 247 215 d d 439 407 d d 183 151 0 0 0 0 0 0
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-17
Instructions Grouped by Functional Categories
Table A-15. Integer Load and Store with Byte Reverse Instructions
Name lhbrx lwbrx sthbrx stwbrx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31
D D S S
A A A A
B B B B
790 534 918 662
0 0 0 0
Table A-16. Integer Load and Store Multiple Instructions
Name lmw stmw
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
46 47 Note:
D S
A A
d d
Table A-17. Integer Load and Store String Instructions
Name lswi lswx stswi stswx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31
D D S S
A A A A
NB B NB B
597 533 725 661
0 0 0 0
Table A-18. Memory Synchronization Instructions
Name eieio isync lwarx stwcx. sync
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 19 31 31 31
00000 00000 D S 00000
00000 00000 A A 00000
00000 00000 B B 00000
854 150 20 150 598
0 0 0 1 0
Table A-19. Floating-Point Load Instructions
Name lfd lfdu lfdux lfdx lfs lfsu
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
50 51 31 31 48 49
D D D D D D
A A A A A A B B
d d 631 599 d d 0 0
A-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Grouped by Functional Categories
lfsux lfsx 31 31 D D A A B B 567 535 0 0
Table A-20. Floating-Point Store Instructions
Name stfd stfdu stfdux stfdx stfiwx 1 stfs stfsu stfsux stfsx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
54 55 31 31 31 52 53 31 31
1Optional
S S S S S S S S S instruction
A A A A A A A A A B B B B B
d d 759 727 983 d d 695 663 0 0 0 0 0
Table A-21. Floating-Point Move Instructions
Name fabsx fmrx fnabsx fnegx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 63 63 63
D D D D
00000 00000 00000 00000
B B B B
264 72 136 40
Rc Rc Rc Rc
Table A-22. Branch Instructions
Name bx bcx bcctrx bclrx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
18 16 19 19 BO BO BO BI BI BI
LI BD 00000 00000 528 16
AA LK AA LK LK LK
Table A-23. Condition Register Logical Instructions
Name crand crandc creqv
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
19 19 19
crbD crbD crbD
crbA crbA crbA
crbB crbB crbB
257 129 289
0 0 0
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-19
Instructions Grouped by Functional Categories
crnand crnor cror crorc crxor mcrf 19 19 19 19 19 19 crbD crbD crbD crbD crbD crfD 00 crbA crbA crbA crbA crbA crfS 00 crbB crbB crbB crbB crbB 00000 225 33 449 417 193 0000000000 0 0 0 0 0 0
Table A-24. System Linkage Instructions
Name rfi 1 sc
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
19 17 Note:
00000 00000
00000 00000
00000
50
0 10
000000000000000
1Supervisor-level
instruction
Table A-25. Trap Instructions
Name tw twi
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 03
TO TO
A A
B SIMM
4
0
Table A-26. Processor Control Instructions
Name mcrxr mfcr mfmsr 1 mfspr 2 mftb mtcrf mtmsr 1 mtspr 2
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31 31 31 31 31 Notes:
crfS D D D D S S D
00
00000 00000 00000 spr tpr 0 00000 spr
CRM
00000 00000 00000
512 19 83 339 371 0 144 146 467
0 0 0 0 0 0 0 0
00000
1Supervisor-level 2Supervisor-
instruction
and user-level instruction
Table A-27. Cache Management Instructions
Name dcba 1, 3
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31
00000
A
B
758
0
A-20
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Grouped by Functional Categories
dcbf dcbi 2 dcbst dcbt dcbtst dcbz icbi 31 31 31 31 31 31 31 Notes:
1Optional 3
00000 00000 00000 00000 00000 00000 00000
A A A A A A A
B B B B B B B
86 470 54 278 246 1014 982
0 0 0 0 0 0 0
instruction instruction
2Supervisor-level
32-bit instruction not implemented by the MPC750
Table A-28. Segment Register Manipulation Instructions.
Name mfsr 1 mfsrin 1 mtsr 1 mtsrin 1
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31 Note:
1Supervisor-level
D D S S
0
SR
00000 B 00000 B
595 659 210 242
0 0 0 0
00000 0
SR
00000
instruction
Table A-29. Lookaside Buffer Management Instructions
Name tlbia 1, 2 tlbie 1, 2 tlbsync 1
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 Notes:
00000 00000 00000
00000 00000 00000
00000 B 00000
370 306 566
0 0 0
1Supervisor-level 2Optional
instruction
instruction
Table A-30. External Control Instructions
Name eciwx ecowx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31
D S
A A
B B
310 438
0 0
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-21
Instructions Sorted by Form
A.4
Instructions Sorted by Form
Key: Reserved bits
Table A-31 through Table A-42 list the PowerPC instructions grouped by form.
Table A-31. I-Form
OPCD LI Specific Instruction Name bx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
AA LK
18
LI
AA LK
Table A-32. B-Form
OPCD BO BI Specific Instruction Name bcx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
BD
AA LK
16
BO
BI
BD
AA LK
Table A-33. SC-Form
OPCD 00000 00000 Specific Instruction Name sc
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
000000000000000
10
17
00000
00000
000000000000000
10
Table A-34. D-Form
OPCD OPCD OPCD OPCD OPCD OPCD OPCD crfD crfD TO D D S S 0L 0L A A A A A A A d SIMM d UIMM SIMM UIMM SIMM
A-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Form
Specific Instructions Name addi addic addic. addis andi. andis. cmpi cmpli lbz lbzu lfd lfdu lfs lfsu lha lhau lhz lhzu lmw 1 lwz lwzu mulli ori oris stb stbu stfd stfdu stfs stfsu sth sthu
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
14 12 13 15 28 29 11 10 34 35 50 51 48 49 42 43 40 41 46 32 33 7 24 25 38 39 54 55 52 53 44 45 crfD crfD
D D D D S S 0L 0L D D D D D D D D D D D D D D S S S S S S S S S S
A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
SIMM SIMM SIMM SIMM UIMM UIMM SIMM UIMM d d d d d d d d d d d d d SIMM UIMM UIMM d d d d d d d d
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-23
Instructions Sorted by Form
stmw 1 stw stwu subfic twi xori xoris 47 36 37 08 03 26 27 Note:
1Load/store
S S S D TO S S
A A A A A A A
d d d SIMM SIMM UIMM UIMM
string/multiple instruction
Table A-35. X-Form
OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD crfD crfD crfD crfD crfD TO D D crbD 00000 D D D D D S S S S S S S S S 0L 00 00 00 00 crfS 0 0 A A 00000 00000
SR
B NB B 00000 00000 B B B NB 00000 B 00000 00000 SH B B 00 00000 00000
IMM
XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO 0 XO XO XO XO XO XO
0 0 0 0 0 Rc 1 0 0 Rc 0 0 0 Rc 0 0 0 0 Rc 0 Rc Rc Rc 0
A A A A A 00000 00000
SR
A A A
00000 00000 A 00000 00000 00000 A
B B 00000 00000 B
A-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Form
OPCD OPCD 00000 00000 00000 00000 B 00000 XO XO 0 0
Specific Instructions andx andcx cmp cmpl cntlzwx dcba 1, 6 dcbf dcbi 2 dcbst dcbt dcbtst dcbz eciwx ecowx eieio eqvx extsbx extshx fabsx fcmpo fcmpu fctiwx fctiwzx fmrx fnabsx fnegx frspx icbi lbzux lbzx lfdux 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 63 63 63 63 63 63 63 63 63 31 31 31 31 crfD crfD D D D D D D 00000 D D D crfD crfD S 00000 00000 00000 00000 00000 00000 00000 D S 00000 S S S D 00 00 S S 0L 0L A A A A A A A A A A A A A A 00000 A A A 00000 A A 00000 00000 00000 00000 00000 00000 A A A A B B B B 00000 B B B B B B B B B 00000 B 00000 00000 B B B B B B B B B B B B B 28 60 0 32 26 758 86 470 54 278 246 1014 310 438 854 284 954 922 264 32 0 14 15 72 136 40 12 982 119 87 631 Rc Rc 0 0 Rc 0 0 0 0 0 0 0 0 0 0 Rc Rc Rc Rc 0 0 Rc Rc Rc Rc Rc Rc 0 0 0 0
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-25
Instructions Sorted by Form
lfdx lfsux lfsx lhaux lhax lhbrx lhzux lhzx lswi 3 lswx 4 lwarx lwbrx lwzux lwzx mcrfs mcrxr mfcr mffsx mfmsr 3 mfsr 3 mfsrin 3 mtfsb0x mtfsb1x mtfsfix mtmsr 3 mtsr 3 nandx norx orx orcx slwx srawx srawix srwx 31 31 31 31 31 31 31 31 31 31 31 31 31 31 63 31 31 63 31 31 31 63 63 63 31 31 31 31 31 31 31 31 31 31 crfD crfD D D D D D crbD crfD crbD S S S S S S S S S S 0 00 0 D D D D D D D D D D D D D D 00 00 crfS A A A A A A A A A A A A A A 00 B B B B B B B B NB B B B B B 00000 00000 00000 00000 00000 00000 B 00000 00000
IMM
599 567 535 375 343 790 311 279 597 533 20 534 55 23 64 512 19 583 83 595 659 70 38 0 134 146 210 476 124 444 412 24 792 824 536
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Rc 0 0 0 Rc Rc Rc 0 0 Rc Rc Rc Rc Rc Rc Rc Rc
00000 00000 00000 00000
SR
00000 00000 00000 00000 00000
SR
00000 00000 B B B B B B SH B
A A A A A A A A
A-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Form
stbux stbx stfdux stfdx stfiwx 1 stfsux stfsx sthbrx sthux sthx stswi 4 stswx 4 stwbrx stwcx. stwux stwx sync tlbia 2, 3, 6 tlbie 2, 3 tlbsync 2, 3 tw xorx 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 Notes:
1Optional
S S S S S S S S S S S S S S S S 00000 00000 00000 00000 TO S
A A A A A A A A A A A A A A A A 00000 00000 00000 00000 A A
B B B B B B B B B B NB B B B B B 00000 00000 B 00000 B B
247 215 759 727 983 695 663 918 439 407 725 661 662 150 183 151 598 370 306 566 4 316
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Rc
instruction instruction string/multiple instruction
2Supervisor-level 3Load/store
Table A-36. XL-Form
OPCD OPCD OPCD OPCD BO crbD crfD 00 BI crbA crfS 00 00000 crbB 00000 00000 XO XO XO XO LK 0 0 0
00000
00000
Specific Instructions Name bcctrx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
19
BO
BI
00000
528
LK
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-27
Instructions Sorted by Form
bclrx crand crandc creqv crnand crnor cror crorc crxor isync mcrf rfi 1 19 19 19 19 19 19 19 19 19 19 19 19 Note:
1Supervisor-level
BO crbD crbD crbD crbD crbD crbD crbD crbD 00000 crfD 00
BI crbA crbA crbA crbA crbA crbA crbA crbA 00000 crfS 00
00000 crbB crbB crbB crbB crbB crbB crbB crbB 00000 00000 00000
16 257 129 289 225 33 449 417 193 150 0 50
LK 0 0 0 0 0 0 0 0 0 0 0
00000
00000
instruction
Table A-37. XFX-Form
OPCD OPCD OPCD OPCD D D S D 0 spr
CRM
XO 0 XO XO XO
0 0 0 0
spr tbr Specific Instructions
Name mfspr 1 mftb mtcrf mtspr 1
0
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31 Note:
1Supervisor-
D D S D 0
spr tbr
CRM
339 371 0 144 467
0 0 0 0
spr
and user-level instruction
Table A-38. XFL-Form
OPCD 0
FM
0
B
XO
Rc
Specific Instructions Name mtfsfx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 FM
63
0
0
B
711
Rc
A-28
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instructions Sorted by Form
Table A-39. XO-Form
OPCD OPCD OPCD D D D A A A B B 00000 OE 0 OE XO XO XO Rc Rc Rc
Specific Instructions Name addx addcx addex addmex addzex divwx divwux mulhwx mulhwux mullwx negx subfx subfcx subfex subfmex subfzex
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
D D D D D D D D D D D D D D D D
A A A A A A A A A A A A A A A A
B B B 00000 00000 B B B B B 00000 B B B 00000 00000
OE OE OE OE OE OE OE 0 0 OE OE OE OE OE OE OE
266 10 138 234 202 491 459 75 11 235 104 40 8 136 232 200
Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc
Table A-40. A-Form
OPCD OPCD OPCD OPCD D D D D A A A 00000 B B 00000 B 00000 C C 00000 XO XO XO XO Rc Rc Rc Rc
Specific Instructions Name faddx faddsx fdivx fdivsx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
63 59 63 59
D D D D
A A A A
B B B B
00000 00000 00000 00000
21 21 18 18
Rc Rc Rc Rc
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-29
Instructions Sorted by Form
fmaddx fmaddsx fmsubx fmsubsx fmulx fmulsx fnmaddx fnmaddsx fnmsubx fnmsubsx fresx 1 frsqrtex 1 fselx 1 fsqrtx 1, 2 fsqrtsx 1, 2 fsubx fsubsx 63 59 63 59 63 59 63 59 63 59 59 63 63 63 59 63 59 Note:
1Optional 2
D D D D D D D D D D D D D D D D D
A A A A A A A A A A 00000 00000 A 00000 00000 A A
B B B B 00000 00000 B B B B B B B B B B B
C C C C C C C C C C 00000 00000 C 00000 00000 00000 00000
29 29 28 28 25 25 31 31 30 30 24 26 23 22 22 20 20
Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc Rc
instruction
32-bit instruction not implemented by the MPC750
Table A-41. M-Form
OPCD OPCD S S A A SH B MB MB ME ME Rc Rc
Specific Instructions Name rlwimix rlwinmx rlwnmx
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
20 21 23
S S S
A A A
SH SH B
MB MB MB
ME ME ME
Rc Rc Rc
A-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Legend
A.5
Instruction Set Legend
Table A-42. PowerPC Instruction Set Legend
UISA VEA OEA Supervisor Level Optional Form XO XO XO D D D D XO XO X X D D I B XL XL X D X D X XL XL XL XL XL XL XL XL X
Table A-42provides general information on the PowerPC instruction set (such as the architectural level, privilege level, and form).
addx addcx addex addi addic addic. addis addmex addzex andx andcx andi. andis. bx bcx bcctrx bclrx cmp cmpi cmpl cmpli cntlzwx crand crandc creqv crnand crnor cror crorc crxor dcba

MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-31
Instruction Set Legend
Table A-42. PowerPC Instruction Set Legend (continued)
UISA dcbf dcbi dcbst dcbt dcbtst dcbz divwx divwux eciwx ecowx eieio eqvx extsbx extshx fabsx faddx faddsx fcmpo fcmpu fctiwx fctiwzx fdivx fdivsx fmaddx fmaddsx fmrx fmsubx fmsubsx fmulx fmulsx fnabsx fnegx fnmaddx fnmaddsx VEA OEA Supervisor Level Optional Form X X X X X X XO XO X X X X X X X A A X X X X A A A A X A A A A X X A A
A-32
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Legend
Table A-42. PowerPC Instruction Set Legend (continued)
UISA fnmsubx fnmsubsx fresx frspx frsqrtex fselx fsqrtx fsqrtsx fsubx fsubsx icbi isync lbz lbzu lbzux lbzx lfd lfdu lfdux lfdx lfs lfsu lfsux lfsx lha lhau lhaux lhax lhbrx lhz lhzu lhzux lhzx lmw 2 VEA OEA Supervisor Level Optional Form A A A X A A A A A A X XL D D X X D D X X D D X X D D X X X D D X X D
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-33
Instruction Set Legend
Table A-42. PowerPC Instruction Set Legend (continued)
UISA lswi 2 lswx 2 lwarx lwbrx lwz lwzu lwzux lwzx mcrf mcrfs mcrxr mfcr mffs mfmsr mfspr1 mfsr mfsrin mftb mtcrf mtfsb0x mtfsb1x mtfsfx mtfsfix mtmsr mtspr1 mtsr mtsrin mulhwx mulhwux mulli mullwx nandx negx norx VEA OEA Supervisor Level Optional Form X X X X D D X X XL X X X X X XFX X X XFX XFX X X XFL X X XFX X X XO XO D XO X XO X
A-34
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Set Legend
Table A-42. PowerPC Instruction Set Legend (continued)
UISA orx orcx ori oris rfi rlwimix rlwinmx rlwnmx sc slwx srawx srawix srwx stb stbu stbux stbx stfd stfdu stfdux stfdx stfiwx stfs stfsu stfsux stfsx sth sthbrx sthu sthux sthx stmw 2 stswi
2
VEA
OEA
Supervisor Level
Optional
Form X X D D

XL M M M SC X X X X D D X X D D X X X D D X X D X D X X D X X
stswx 2
MOTOROLA
Appendix A. PowerPC Instruction Set Listings
A-35
Instruction Set Legend
Table A-42. PowerPC Instruction Set Legend (continued)
UISA stw stwbrx stwcx. stwu stwux stwx subfx subfcx subfex subfic subfmex subfzex sync tlbiax tlbiex tlbsync tw twi xorx xori xoris Notes:
1
VEA
OEA
Supervisor Level
Optional
Form D X X D X X XO XO XO D XO XO X

X X X X D X D D
Supervisor- and user-level instruction string or multiple instruction is optional for 64-bit implementations only. 32-bit instruction not implemented by the MPC750
2 Load/store 3 4 Instruction
A-36
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Appendix B Instructions Not Implemented
This appendix provides a list of the 32-bit and 64-bit PowerPC instructions that are not implemented in the MPC750 microprocessor. Note that any attempt to execute instructions that are not implemented on the MPC750 will generate an illegal instruction exception. Note that exceptions are referred to as interrupts in the architecture specification. Table B-1 provides the 32-bit PowerPC instructions that are optional to the PowerPC architecture but not implemented by the MPC750.
Table B-1. 32-Bit Instructions Not Implemented by the MPC750 Processor
Mnemonic dcba fsqrt fsqrts tlbia Instruction Data Cache Block Allocate Floating Square Root (Double-Precision) Floating Square Root Single TLB Invalidate All
Table B-2 provides a list of 64-bit instructions that are not implemented by the MPC750.
Table B-2. 64-Bit Instructions Not Implemented by the MPC750 Processor
Mnemonic cntlzd divd divdu extsw fcfid fctid fctidz ld ldarx ldu ldux ldx Instruction Count Leading Zeros Double Word Divide Double Word Divide Double Word Unsigned Extend Sign Word Floating Convert From Integer Double Word Floating Convert to Integer Double Word Floating Convert to Integer Double Word with Round toward Zero Load Double Word Load Double Word and Reserve Indexed Load Double Word with Update Load Double Word with Update Indexed Load Double Word Indexed
MOTOROLA
Appendix B. Instructions Not Implemented
B-1
Table B-2. 64-Bit Instructions Not Implemented by the MPC750 Processor
Mnemonic lwa lwaux lwax mtmsrd mtsrd mtsrdin mulld mulhd mulhdu rldcl rldcr rldic rldicl rldicr rldimi slbia slbie sld srad sradi srd std stdcx. stdu stdux stdx td tdi Load Word Algebraic Load Word Algebraic with Update Indexed Load Word Algebraic Indexed Move to Machine State Register Double Word Move to Segment Register Double Word Move to Segment Register Double Word Indirect Multiply Low Double Word Multiply High Double Word Multiply High Double Word Unsigned Rotate Left Double Word then Clear Left Rotate Left Double Word then Clear Right Rotate Left Double Word Immediate then Clear Rotate Left Double Word Immediate then Clear Left Rotate Left Double Word Immediate then Clear Right Rotate Left Double Word Immediate then Mask Insert SLB Invalidate All SLB Invalidate Entry Shift Left Double Word Shift Right Algebraic Double Word Shift Right Algebraic Double Word Immediate Shift Right Double Word Store Double Word Store Double Word Conditional Indexed Store Double Word with Update Store Double Word Indexed with Update Store Double Word Indexed Trap Double Word Trap Double Word Immediate Instruction
B-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Appendix C MPC755 Embedded G3 Microprocessor
The MPC755 is a derivative of the MPC750 microprocessor design and is intended primarily for use in embedded systems. All of the information in the MPC750 RISC Microprocessor Family User's Manual applies to the MPC755 microprocessor with the exceptions and additions noted in this appendix. In the event the two descriptions conflict with each other, this appendix supersedes the information in the MPC750 RISC Microprocessor Family User's Manual. The MPC745 is a lower-pin-count device that operates identically to the MPC755, except that it doesn't implement the L2 cache interface. In the same way that the MPC750 User's Manual also describes the functionality of the MPC740, this appendix describes the functionality of the MPC745. All information herein applies to the MPC745, except where otherwise noted (in particular, the L2 cache information does not apply to the MPC745). This document describes specific details about the implementation of the MPC755 as a low-power, 32-bit member of the processor family that implements the PowerPC architecture, and how it differs from the MPC750. Note that the individual section headings indicate the chapters in the MPC750 User's Manual to which they correspond. The sections are as follows: * * * C.1, "MPC755 Overview," describes general features of the MPC755 with respect to the PowerPC architecture. C.4, "The MPC755 Programming Model (Chapter 2)," describes the differences between the programming model of the MPC750 and MPC755. C.5, "MPC755 L1 Instruction and Data Cache Operation (Chapter 3)," describes the aspects of the L1 instruction and data cache operation that are specific to the MPC755. C.6, "MPC755 Exceptions (Chapter 4)," describes how the MPC755 embedded processor implements the exception model defined by the PowerPC operating environment architecture (OEA). C.7, "MPC755 Memory Management (Chapter 5)," describes the MPC755 embedded processor's implementation of the memory management unit (MMU) specifications provided by the PowerPC operating environment architecture (OEA).
*
*
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-1
MPC755 Overview
*
* * * * *
C.8, "MPC755 Instruction Timing (Chapter 6)," describes how the MPC755 embedded processor fetches, dispatches, and executes instructions and how it reports the results of instruction execution. C.9, "MPC755 Signal Descriptions (Chapter 7)," describes the MPC755 embedded processor's external signals. C.10, "MPC755 System Interface Operation (Chapter 8)," describes the MPC755 embedded processor bus interface and its operation. C.11, "MPC755 L2 Cache Interface Operation (Chapter 9)," describes the L2 cache interface and the private memory features of the MPC755. C.12, "Power and Thermal Management (Chapter 10)," describes the hardware support provided by the MPC755 for power and thermal management. C.13, "Performance Monitor (Chapter 11)," describes the performance monitor of the MPC755.
Errata for the previous version of the MPC750 RISC Microprocessor Family User's Manual is listed in Appendix D, "User's Manual Revision History." These corrections also apply to the MPC740. Table C-1 provides a revision history for this appendix.
Table C-1. Document Revision History
Document Revision Rev. 0-2 Rev. 3 Substantive Change(s) Initial release of the MPC750 RISC Microprocessor Family User's Manual Errata. Combined the MPC750 User's Manual Errata and MPC755 Supplement documents. MPC755 Supplement, Section 9.2.2--Edited first sentence of second bullet. MPC755 Supplement, Section 9.4.1--In Table 26, replaced L2SL (bit 16) description. This appendix This material was taken from the MPC755 Supplement and added to Revision 1 of the MPC750 RISC Microprocessor Family User's Manual document as this appendix. No substantive changes to the information.
C.1
MPC755 Overview
This section is an overview of the MPC755. The following list of functional additions to the MPC755 from the MPC750 summarizes the changes visible either to a programmer or a system designer. * * * * *
C-2
Instruction and data cache locking mechanism added Four IBAT and four DBAT entries added Software table search mode added Four special-purpose (SPRG) registers added Parity generation and detection on L2 address bus added
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC755 Functional Description
* * * * *
Instruction-only mode to L2 cache added Private SRAM capability to L2 cache interface added PB3-type SRAM support to L2 cache interface added 32-bit data bus mode added Bus voltage select (BVSEL) and L2 cache interface voltage select (L2VSEL) added
C.2
MPC755 Functional Description
This section summarizes some of the functional differences between the MPC750 and the MPC755. For information about the MPC755 L1 cache, see C.5, "MPC755 L1 Instruction and Data Cache Operation (Chapter 3)." The MPC755 has independent on-chip, 32-Kbyte, eight-way set-associative, physically addressed caches for instructions and data and independent instruction and data memory management units (MMUs). Each MMU has a 128-entry, two-way set-associative translation lookaside buffer (DTLB and ITLB) that saves recently used page address translations. Block address translation on the MPC755 is performed by either two four-entry or two eight-entry BAT arrays--one for instruction and one for data block address translation (IBAT and DBAT arrays). Note that the IBAT and DBAT arrays defined by the PowerPC architecture only contain four entries each. During block translation, effective addresses are compared simultaneously with all enabled BAT entries. The MPC755 also optionally supports software table search operations. The L2 cache is implemented with an on-chip, two-way set-associative tag memory, and with external, synchronous SRAMs for data storage. The external SRAMs are accessed through a dedicated L2 cache port that supports a single bank of up to 1 Mbyte of synchronous SRAMs. For information about the L2 cache implementation, see C.11, "MPC755 L2 Cache Interface Operation (Chapter 9)." The MPC755 has a 32-bit address bus and a 32/64-bit data bus. Multiple devices compete for system resources through a central external arbiter. The MPC755 three-state cache-coherency protocol (MEI) supports the exclusive, modified, and invalid states, a compatible subset of the modified/exclusive/shared/invalid (MESI) four-state protocol, and it operates coherently in systems with four-state caches. The MPC755 supports single-beat and burst data transfers for memory accesses and memory-mapped I/O operations. The system interface is described in C.9, "MPC755 Signal Descriptions (Chapter 7)," and C.10, "MPC755 System Interface Operation (Chapter 8)."
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-3
MPC755 Functional Description
The MPC755 has four software-controllable power-saving modes. Three static modes (doze, nap, and sleep) progressively reduce power dissipation. When functional units are idle, a dynamic power management mode causes those units to enter a low-power mode automatically without affecting operational performance, software execution, or external hardware. The MPC755 also provides a thermal assist unit (TAU) and a way to reduce the instruction fetch rate for limiting power dissipation. Power management is described in C.12, "Power and Thermal Management (Chapter 10)." Figure C-1 shows the MPC755 block diagram and parallel organization of the execution units (shaded in the diagram). The instruction unit fetches, dispatches, and predicts branch instructions. Note that this is a conceptual model that shows basic features rather than attempting to show how features are implemented physically.
C-4
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Instruction Unit
Fetcher BTIC 64 Entry SRs (Shadow) BHT CTR LR
MOTOROLA
Branch Processing Unit Instruction MMU 128-Bit (4 Instructions)
Instruction Queue (6 Word)
Additional Features
* Time Base Counter/Decrementer * Clock Multiplier * JTAG/COP Interface * Thermal/Power Management * Performance Monitor
IBAT Array
Tags
32-Kbyte I Cache
ITLB
2 Instructions
Dispatch Unit 64-Bit (2 Instructions)
Reservation Station Reservation Station Reservation Station GPR File Rename Buffers (6) Integer Unit 2 System Register Unit
Reservation Station (2 Entry)
FPR File
Rename Buffers (6)
Reservation Station
Integer Unit 1
32-Bit Load/Store Unit 64-Bit +
(EA Calculation) Store Queue
64-Bit Floating-Point Unit +x/
FPSCR FPSCR
+x/
CR 32-Bit 32-Bit
+
Figure C-1. MPC755 Block Diagram
PA
Data MMU SRs (Original) DBAT Array DTLB
EA 60x Bus Interface Unit 64-Bit
Instruction Fetch Queue L1 Castout Queue
Appendix C. MPC755 Embedded G3 Microprocessor
Tags
32-Kbyte D Cache Data Load Queue
Completion Unit
Reorder Buffer (6 Entry)
L2 Bus Interface Unit
L2 Castout Queue
L2 Controller L2CR 32-Bit Address Bus 32/64-Bit Data Bus 17-Bit L2 Address Bus 64-Bit L2 Data Bus L2 Tags Not in the MPC745
MPC755 Functional Description
C-5
MPC755 Features
C.3
MPC755 Features
This section lists the features of the MPC755. The interrelationship of these features is shown in Figure C-1. The major features of the MPC755 are as follows: * High-performance, superscalar microprocessor -- As many as four instructions can be fetched from the instruction cache per clock cycle -- As many as two instructions can be dispatched per clock -- As many as six instructions can execute per clock (including two integer instructions) -- Single-clock-cycle execution for most instructions Six independent execution units and two register files -- BPU featuring both static and dynamic branch prediction - 64-entry (16-set, four-way set-associative) branch target instruction cache (BTIC), a cache of branch instructions that have been encountered in branch/loop code sequences. If a target instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache. Typically, if a fetch access hits the BTIC, it provides the first two instructions in the target stream. - 512-entry branch history table (BHT) with two bits per entry for four levels of prediction--not-taken, strongly not-taken, taken, strongly taken - Branch instructions that do not update the count register (CTR) or link register (LR) are removed from the instruction stream -- Two integer units (IUs) that share thirty-two 32-bit GPRs for integer operands - IU1 can execute any integer instruction - IU2 can execute all integer instructions except multiply and divide instructions (shift, rotate, arithmetic, and logical instructions). Most instructions that execute in the IU2 take one cycle to execute. The IU2 has a single-entry reservation station. -- Three-stage floating-point unit (FPU) - Fully IEEE 754-1985-compliant FPU for both single- and double-precision operations - Supports non-IEEE mode for time-critical operations - Hardware support for denormalized numbers - Single-entry reservation station - Thirty-two 64-bit FPRs for single- or double-precision operands -- Two-stage load/store unit (LSU) - Two-entry reservation station
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
*
C-6
MPC755 Features
*
*
*
- Single-cycle, pipelined cache access - Dedicated adder performs EA calculations - Performs alignment and precision conversion for floating-point data - Performs alignment and sign extension for integer data - Three-entry store queue - Supports both big- and little-endian modes -- System register unit (SRU) handles miscellaneous instructions - Executes CR logical and Move to/Move from SPR instructions (mtspr and mfspr) - Single-entry reservation station Rename buffers -- Six GPR rename buffers -- Six FPR rename buffers -- Condition register buffering supports two CR writes per clock Completion unit -- The completion unit retires an instruction from the six-entry reorder buffer (completion queue) when all instructions ahead of it have been completed, the instruction has finished execution, and no exceptions are pending. -- Guarantees sequential programming model (precise exception model) -- Monitors all dispatched instructions and retires them in order -- Tracks unresolved branches and flushes instructions from the mispredicted branch -- Retires as many as two instructions per clock Separate on-chip instruction and data caches (Harvard architecture) -- 32-Kbyte, eight-way set-associative instruction and data caches -- Pseudo least-recently-used (PLRU) replacement algorithm -- 32-byte (eight-word) cache block -- Physically indexed/physical tags -- Cache write-back or write-through operation programmable on a per-page or per-block basis -- Instruction cache can provide four instructions per clock; data cache can provide two words per clock -- Caches can be disabled in software -- Caches can be locked six of eight ways or the entire cache can be locked in software -- Data cache coherency (MEI) maintained in hardware
Appendix C. MPC755 Embedded G3 Microprocessor C-7
MOTOROLA
MPC755 Features
*
*
*
-- The critical double word is made available to the requesting unit when it is burst into the line-fill buffer. The cache is nonblocking, so it can be accessed during this operation. Level 2 (L2) cache interface (the L2 cache interface is not supported in the MPC745) -- On-chip two-way set-associative L2 cache controller and tags -- External data SRAMs -- Support for 256-Kbyte, 512-Kbyte, and 1-Mbyte L2 caches -- 64-byte (256-Kbyte/512-Kbyte) and 128-byte (1-Mbyte) sectored line size -- Supports flow-through (register-buffer), both PB2 and PB3 pipelined (register-register), and pipelined late-write (register-register) synchronous burst SRAMs Separate memory management units (MMUs) for instructions and data -- 52-bit virtual address; 32-bit physical address -- Address translation for 4-Kbyte pages, variable-sized blocks, and 256-Mbyte segments -- Memory programmable as write-back/write-through, cacheable/noncacheable, and coherency enforced/coherency not enforced on a page or block basis -- Separate IBATs and DBATs (selectable four or eight each) also defined as SPRs -- Separate instruction and data translation lookaside buffers (TLBs) - Both TLBs are 128-entry, two-way set-associative, and use PLRU replacement algorithm -- TLBs are reloaded by the hardware or optionally, by software Separate bus interface units for system memory and for the L2 cache -- Bus interface features include the following: - Selectable bus-to-core clock frequency ratios as described in the MPC755 Hardware Specification - 32/64-bit, split-transaction external data bus with burst transfers with 32-bit mode selectable at reset - Support for address pipelining and limited out-of-order bus transactions - Single-entry load queue - Single-entry instruction fetch queue - Two-entry L1 cache castout queue - No-DRTRY mode eliminates the DRTRY signal from the qualified bus grant. This allows the forwarding of data during load operations to the internal core one bus cycle sooner than if the use of DRTRY is enabled. -- L2 cache interface features (which are not implemented on the MPC745) include the following:
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
C-8
MPC755 Features
*
*
* *
- Core-to-L2 frequency divisors as described in the MPC755 Hardware Specification - Four-entry L2 cache castout queue in L2 cache BIU - 17-bit address bus - 64-bit data bus - 8-bit parity for address and data - Private memory mode, allowing software to access L2 SRAM as private memory space Multiprocessing support features include the following: -- Hardware-enforced, three-state cache coherency protocol (MEI) for data cache -- Load/store with reservation instruction pair for atomic memory references, semaphores, and other multiprocessor operations Power and thermal management -- Three static modes (doze, nap, and sleep) progressively reduce power dissipation: - Doze--All the functional units are disabled except for the time base/decrementer registers and the bus snooping logic. - Nap--The nap mode further reduces power consumption by disabling all functional units, disabling snooping, and leaving only the time base register and the PLL in a powered state. If snooping is required, the QACK input signal can be negated to wake up the processor and snooping logic. - Sleep--All internal functional units are disabled, after which external system logic may disable the PLL and SYSCLK. -- Thermal management facility provides software-controllable thermal management. Thermal management is performed through the use of three supervisor-level registers and an MPC755-specific thermal management exception. -- Instruction cache throttling provides control of instruction fetching to limit power consumption. Performance monitor can be used to help debug system designs and improve software efficiency. In-system testability and debugging features through JTAG boundary-scan capability
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-9
The MPC755 Programming Model (Chapter 2)
C.4
The MPC755 Programming Model (Chapter 2)
This section describes the differences between the programming model of the MPC750 and MPC755. For detailed information about architecture-defined features, see the Programming Environments Manual. This section is organized as follows: * * * Section C.4.1, "MPC755-Specific Registers," Section C.4.2, "MPC750 and MPC755 Instruction Use," and Section C.4.3, "tlbld and tlbli Instructions."
Figure C-2 shows the registers implemented in the MPC755, indicating those that are defined by the PowerPC architecture and those that are MPC755-specific.
C-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC755 Programming Model (Chapter 2)
SUPERVISOR MODEL--OEA USER MODEL--VEA
Time Base Facility (For Reading) TBL TBR 268 TBU TBR 269
Configuration Registers
Hardware Implementation Registers1 HID0 HID1 HID2 SPR 1008 SPR 1009 SPR 1011 Processor Version Register PVR SPR 287 Machine State Register MSR
USER MODEL--UISA
Count Register CTR XER XER Link Register LR SPR 8 Floating-Point Registers FPR0 FPR1 SPR 1 GPR31 SPR 9 General-Purpose Registers GPR0 GPR1
Memory Management Registers
Instruction BAT Registers IBAT0U IBAT0L SPR 528 SPR 529 Data BAT Registers DBAT0U DBAT0L SPR 536 SPR 537 Software Table Search Registers1 DMISS DCMP HASH1 HASH2 IBAT3U IBAT3L IBAT4U1 IBAT4L1 SPR 534 SPR 535 SPR 560 SPR 561 DBAT3U DBAT3L DBAT4U1 DBAT4L1 SPR 542 SPR 543 SPR 568 SPR 569 IMISS ICMP RPA Segment Registers SR0 SPR 976 SPR 977 SPR 978 SPR 979 SPR 980 SPR 981 SPR 982
Performance Monitor Registers (For Reading)
Performance UPMC1 UPMC2 UPMC3 UPMC4 Counters1 SPR 937 SPR 938 SPR 941 SPR 942
FPR31 Condition Register CR Floating-Point Status and Control Register FPSCR
IBAT7U1 IBAT7L1 SDR1 SDR1 SPRGs 2 SPRG0 SPRG1
SPR 566 SPR 567
DBAT7U1 DBAT7L1
SPR 574 SPR 575
SR1
Sampled Instruction Address1 USIA Monitor SPR 939
SPR 25
SR15
Exception Handling Registers
SPR 272 SPR 273 Data Address Register DAR DSISR DSISR SPR 18 SPR 19 Save and Restore Registers SRR0 SRR1 SPR 26 SPR 27
Control1 SPR 936 SPR 940
UMMCR0 UMMCR1
Performance Monitor Registers
Performance Counters1 PMC1 PMC2 PMC3 PMC4 SPR 953 SPR 954 SPR 957 SPR 958 Sampled Instruction Address1 SIA SPR 955
SPRG7
SPR 279
Miscellaneous Registers
External Access Register EAR SPR 282 Time Base (For Writing) TBL TBU Data Address Breakpoint Register DABR SPR 1013 L2 Control Register1,3 L2CR SPR 1017 SPR 284 SPR 285 Instruction Address Breakpoint Register1 IABR SPR 1010 Decrementer DEC SPR 22
Monitor Control1 MMCR0 MMCR1 SPR 952 SPR 956
Power/Thermal Management Registers
Thermal Assist Unit Registers1 THRM1 THRM2 THRM3 SPR 1020 SPR 1021 Instruction Cache Throttling Control Register1 ICTC SPR 1019
L2 Private Memory Control Register2,3 L2PM SPR 1016
SPR SPGR[4-7] and L2PM are MPC755-specific registers. They may not be supported by other processors. 2. 1022
3. Not supported on the MPC745.
1. These registers are MPC750/755-specific registers. They may not be supported by other processors.
Figure C-2. Programming Model--MPC755 Microprocessor Registers
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-11
The MPC755 Programming Model (Chapter 2)
C.4.1
MPC755-Specific Registers
The MPC755 processor programming model is functionally identical to that of the MPC750 except for some differences in the PVR (described in Section C.4.1.2, "Processor Version Register (PVR)") and the L2CR (described in Section C.11.4.1, "L2 Cache Control Register (L2CR)"). Additionally, the following special-purpose registers are added in the MPC755 that are not defined by the PowerPC architecture: * Special-purpose registers used for general purpose (SPRG[4-7])--Four additional SPRG registers have been implemented to assist in searching the page tables in software. This is a replacement for having the MSR[TGPR] bit of the MPC603e and four temporary general purpose registers. Note that the MSR[TGPR] bit is not implemented in the MPC755. If software table searching is not enabled, then these registers may be used for any supervisor purpose. The format of these registers is the same as that of SPRG[0-3] defined in Chapter 2, "Programming Model." Hardware implementation-dependent register 2 (HID2)--This register, which is not implemented in the MPC750, is used to enable L2 address parity, software table search operations, IBAT[4-7] and DBAT[4-7], and instruction and data cache way locking. This register is described in Section C.4.1.3, "Hardware Implementation-Dependent Register 2 (HID2)." Instruction and data block address translation entries (IBAT[4-7] and DBAT[4-7]) which are optionally enabled in HID2--BATs are software-controlled arrays that store the available block address translations on-chip. BAT array entries are implemented as pairs of BAT registers that are accessible as supervisor special-purpose registers (SPRs). Four additional IBATs and four additional DBATs array entries provide a mechanism for translating additional blocks as large as 256 Mbytes from the 32-bit effective address space into the physical memory space. This can be used for translating large address ranges whose mappings do not change frequently. The format of these registers is the same as that of IBAT[0-3] and DBAT[0-3] defined in Chapter 2, "Programming Model." The SPR numbers for accessing these registers are outlined in Table C-2. The software table search registers are as follows (see C.7, "MPC755 Memory Management (Chapter 5)," for more detailed information): -- Data and instruction TLB miss registers (DMISS and IMISS)--The DMISS and IMISS registers contain the effective page address of the access that caused the TLB miss exception. The contents are used by the MPC755 when calculating the values of HASH1 and HASH2, and by the tlbld and tlbli instructions when loading a new TLB entry. -- Data and instruction TLB compare registers (DCMP and ICMP)--These registers contain the first word in the required page table entry (PTE). The contents are constructed automatically from the contents of the segment registers and the effective address (DMISS or IMISS) when a TLB miss exception occurs. Each PTE read from the tables during the table search process should be
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
*
*
*
C-12
The MPC755 Programming Model (Chapter 2)
*
compared with this value to determine whether or not the PTE is a match. Upon execution of a tlbld or tlbli instruction the upper 25 bits of the DCMP or ICMP register and 11 bits of the effective address operand are loaded into the first word of the selected TLB entry. -- Primary and secondary hash address registers (HASH1 and HASH2)--These registers contain the physical addresses of the primary and secondary page table entry groups (PTEGs) for the access that caused the TLB miss exception. For convenience, the MPC755 automatically constructs the full physical address by routing bits 0-6 of SDR1 into HASH1 and HASH2 and clearing the lower 6 bits. These registers are read-only and are constructed from the contents of the DMISS or IMISS register (the register choice is determined by which miss was last acknowledged). -- Required physical address register (RPA)--During a page table search operation, the software must load the RPA with the second word of the correct PTE. When the tlbld or tlbli instruction is executed, the contents of the RPA register and the DMISS or IMISS register are merged and loaded into the selected TLB entry. The referenced (R) bit is ignored when the write occurs (no location exists in the TLB entry for this bit). The RPA register is read and write to the software. L2 private memory control register (L2PM)--The L2 cache private memory control register allows a portion of the physical address space to be directly mapped into a portion of the L2 SRAM. It is a supervisor-only, read/write, implementation-specific special purpose register (SPR) which is accessed as SPR 1016 (decimal). The L2PM is initialized to all 0s during power-on reset and is described more completely in Section C.11.4.2, "L2 Private Memory Control Register (L2PM)."
C.4.1.1
The MPC755 Additional SPR Encodings
Table C-2 describes the encodings of the MPC755 register set additions described in this section.
Table C-2. Additional SPR Encodings
SPR Register Decimal 276 277 278 279 560 561 562 SPR[5-9] 01000 01000 01000 01000 10001 10001 10001 SPR[0-4] 10100 10101 10110 10111 10000 10001 10010 SPRG4 SPRG5 SPRG6 SPRG7 IBAT4U IBAT4L IBAT5U Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Access
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-13
The MPC755 Programming Model (Chapter 2)
Table C-2. Additional SPR Encodings (continued)
SPR Register Decimal 563 564 565 566 567 568 569 570 571 572 573 574 575 976 977 978 979 980 981 982 1011 1016 SPR[5-9] 10001 10001 10001 10001 10001 10001 10001 10001 10001 10001 10001 10001 10001 11110 11110 11110 11110 11110 11110 11110 11111 11111 SPR[0-4] 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 10000 10001 10010 10011 10100 10101 10110 10011 11000 IBAT5L IBAT6U IBAT6L IBAT7U IBAT7L DBAT4U DBAT4L DBAT5U DBAT5L DBAT6U DBAT6L DBAT7U DBAT7L DMISS DCMP HASH1 HASH2 IMISS ICMP RPA HID2 L2PM Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Access
C.4.1.2
Processor Version Register (PVR)
The processor version register (PVR) is a 32-bit, read-only register present in the MPC750 but initialized to a different value. It contains a value identifying the specific version (model) and revision level of the processor (see Figure C-3). The contents of the PVR can be copied to a GPR by the mfspr instruction. Read access to the PVR is supervisor-level only; write access is not provided.
Version
0 15 16
Revision
31
Figure C-3. Processor Version Register (PVR)
C-14
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
The MPC755 Programming Model (Chapter 2)
The PVR consists of two 16-bit fields: * Version (bits 0-15)--A 16-bit number that uniquely identifies a particular processor version. This number can be used to determine the version of a processor; it may not distinguish between different end product models if more than one model uses the same processor. Revision (bits 16-31)--A 16-bit number that distinguishes between various releases of a particular version (that is, an engineering change level). The value of the revision portion of the PVR is implementation-specific. The processor revision level is changed for each revision of the device.
*
Software can distinguish between the MPC750 and the MPC755 by reading the PVR. The MPC755 PVR reads as 0x0008_3100. The version is 0x0008 and the revision level starts at 0x3100.
C.4.1.3
Hardware Implementation-Dependent Register 2 (HID2)
The MPC755 implements an additional hardware implementation-dependent register not described in Chapter 2, "Programming Model," shown in Figure C-4. It is a supervisor-only, read/write, implementation-specific special purpose register (SPR) which is accessed as SPR 1011 (decimal).
HIGH_BAT_EN SWT_EN L2AP_EN 0
0
Reserved IWLCK[0-2] DWLCK[0-2]
0
0
0
00
0
0
000
0
0
0
18 19
0
0
0
0
23 24
0
26 27
0
0
0
0
31
10 11 12 13 14 15 16
Figure C-4. Hardware Implementation-Dependent Register 2 (HID2)
Table C-3 describes the HID2 fields.
Table C-3. Hardware Implementation Dependent Register 2 (HID2) Field Descriptions
Bits 0-10 11 Name -- L2AP_EN Reserved L2 address parity enable. When this bit is set, some of the L2 address signals are used in the parity generated on L2DP[0:7]. See Section C.11.5, "L2 Address and Data Parity Signals," for the combinations supported. Software table search enable. Setting this bit causes one of three new exceptions when a TLB miss occurs. See C.6, "MPC755 Exceptions (Chapter 4)," and C.7, "MPC755 Memory Management (Chapter 5)," for more information on the use of software table search operations. Description
12
SWT_EN
13
HIGH_BAT_EN IBAT[4-7] and DBAT[4-7] enable. When this bit is set, four more IBAT and DBAT entries are available for translating blocks of memory. See C.4, "The MPC755 Programming Model (Chapter 2)," for more information on the SPR numbers used for accessing the new BATs.
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-15
The MPC755 Programming Model (Chapter 2)
Table C-3. Hardware Implementation Dependent Register 2 (HID2) Field Descriptions (continued)
Bits 14-15 16-18 Name -- IWLCK[0-2] Reserved Instruction cache way lock. Useful for locking blocks of instructions into the instruction cache for time-critical applications that require deterministic behavior. See Section C.5.2.3, "Performing Data and Instruction Cache Locking." 000 = no ways locked 001 = way0 locked 010 = way0 thru way1 locked 011 = way0 thru way2 locked 100 = way0 thru way3 locked 101 = way0 thru way4 locked 110 = way0 thru way5 locked 111 = Reserved Reserved Data cache way lock. Useful for locking blocks of data into the data cache for time-critical applications where deterministic behavior is required. See Section C.5.2.3, "Performing Data and Instruction Cache Locking." 000 = no ways locked 001 = way0 locked 010 = way0 thru way1 locked 011 = way0 thru way2 locked 100 = way0 thru way3 locked 101 = way0 thru way4 locked 110 = way0 thru way5 locked 111 = Reserved Reserved Description
19-23 24-26
-- DWLCK[0-2]
27-31
--
C.4.2
MPC750 and MPC755 Instruction Use
This section describes some restrictions of the stdf, mtsr, and mtsrin instructions on both the MPC750 and MPC755. In addition, the dcbz instruction has cache coherency implications described in Section C.5.1.2, "dcbz and L1 Cache Coherency."
C.4.2.1
stfd Instruction Use
The MPC750 and MPC755 require that the FPRs be initialized with floating-point values before the stfd instruction is used. Otherwise, a random power-on value for an FPR may cause unpredictable device behavior when the stfd instruction is executed. Note that any floating-point value loaded into the FPRs is acceptable.
C.4.2.2
isync Instruction Use with mtsr and mtsrin
The MPC750 and MPC755 have a restriction on the use of the mtsr and mtsrin instructions not described in the Programming Environments Manual or in Chapter 2, "Programming Model." The MPC750 and MPC755 require that an isync instruction be executed after either an mtsr or mtsrin instruction. This isync instruction must occur after the execution
C-16 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
The MPC755 Programming Model (Chapter 2)
of the mtsr or mtsrin and before the data address translation mechanism uses any of the on-chip segment registers.
C.4.3
tlbld and tlbli Instructions
This section provides a detailed description of the two implementation-specific instructions used for software table search operations--tlbld and tlbli (same as the MPC603e). The address translation mechanism is defined in terms of segment descriptors and page table entries (PTEs) used by processors that implement the PowerPC architecture to locate the effective-to-physical address mapping for a particular access. The PTEs reside in page tables in memory. As defined for 32-bit implementations by the PowerPC architecture, segment descriptors reside in 16 on-chip segment registers. Similar to the MPC603e, the MPC755 provides two implementation-specific instructions (tlbld and tlbli) that are used by software table search operations following TLB misses to load TLB entries on-chip (not provided by the MPC750 because the MPC750 does not support software table search operations). Refer to C.7, "MPC755 Memory Management (Chapter 5)," for more information about the TLB registers and software table search operations with the MPC755. Table C-4 lists the TLB instructions implemented in the MPC755.
Table C-4. Translation Lookaside Buffer Management Instructions
Name TLB Invalidate Entry TLB Synchronize Load Data TLB Entry Load Instruction TLB Entry Mnemonic tlbie tlbsync tlbld tlbli Operand Syntax rB -- rB rB
Because the presence and exact semantics of the translation lookaside buffer management instructions are implementation-dependent, system software should incorporate uses of the instructions into subroutines to maximize compatibility with programs written for other processors. For more information on the PowerPC instruction set, refer to Chapter 4, "Addressing Modes and Instruction Set Summary," and Chapter 8, "Instruction Set," in the Programming Environments Manual.
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-17
The MPC755 Programming Model (Chapter 2)
tlbld
Load Data TLB Entry
tlbld
Integer Unit
tlbld
rB
Reserved
31 00000 00000 B 978 0
0
56
10 11
15 16
20 21
30 31
EA (rB) TLB entry created from DCMP and RPA DTLB entry selected by EA[15-19] and SRR1[WAY] created TLB entry
The EA is the contents of rB. The tlbld instruction loads the contents of the data PTE compare (DCMP) and required physical address (RPA) registers into the first word of the selected data TLB entry. The specific DTLB entry to be loaded is selected by the EA and the SRR1[WAY] bit. The tlbld instruction should only be executed when address translation is disabled (MSR[IR] = 0 and MSR[DR] = 0). Note that it is possible to execute the tlbld instruction when address translation is enabled; however, extreme caution should be used in doing so. If data address translation is enabled (MSR[DR] = 1) tlbld must be preceded by a sync instruction and succeeded by a context synchronizing instruction. Note also that care should be taken to avoid modification of the instruction TLB entries that translate current instruction prefetch addresses. This is a supervisor-level instruction; it is also a MPC755-specific instruction, and not part of the PowerPC instruction set. Other registers altered: * None
C-18
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
tlbli
Load Instruction TLB Entry
tlbli
Integer Unit
tlbli
rB
Reserved
31 00000 00000 B 1010 0
0
56
10 11
15 16
20 21
30 31
EA (rB) TLB entry created from ICMP and RPA ITLB entry selected by EA[15-19] and SRR1[WAY] created TLB entry
The EA is the contents of rB. The tlbli instruction loads the contents of the instruction PTE compare (ICMP) and required physical address (RPA) registers into the first word of the selected instruction TLB entry. The specific ITLB entry to be loaded is selected by the EA and the SRR1[WAY] bit. The tlbli instruction should only be executed when address translation is disabled (MSR[IR] = 0 and MSR[DR] = 0). Note that it is possible to execute the tlbli instruction when address translation is enabled; however, extreme caution should be used in doing so. If instruction address translation is enabled (MSR[IR] = 1), tlbli must be followed by a context synchronizing instruction such as isync or rfi. Note also that care should be taken to avoid modification of the instruction TLB entries that translate current instruction prefetch addresses. This is a supervisor-level instruction; it is also a MPC755-specific instruction, and not part of the PowerPC instruction set. Other registers altered: * None
C.5
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
This section describes L1 cache coherency issues and also describes the new instruction and data cache way locking features of the MPC755 embedded processor. Otherwise, the L1 instruction and data cache operation is the same as the MPC750. The MPC755 includes a mechanism for allocating cache entries for a particular group of ways for both the instruction and data caches. If a way is locked, the data loaded in that cache way will not be replaced by an access to another address; that is, none of the entries in a locked cache way are re-allocated. One to six of the eight ways in a cache can be locked
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-19
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
with the IWLCK and DWLCK bits of the HID2 register. All eight ways of a cache can be locked using the ILOCK or DLOCK bits of the HID0 register. Note that integrated devices based on the MPC603e G2 processor core may also implement entire and cache way locking. However, the G2-based processor caches are only four-way set-associative, so only up to three ways can be locked. Additionally, the bit encodings in HID2 for enabling way-locking differ from the encodings used in the MPC755 and they do not correspond. Even though the G2 core processors also define similar IWLCK[0-2] and DWLCK[0-2] fields in HID2, the encodings are distinctly different.
C.5.1
L1 Cache Coherency
This section describes some L1 coherency precautions for the MPC755 in addition to that described in Chapter 3, "L1 Instruction and Data Cache Operation."
C.5.1.1
Coherency Precautions in Single Processor Systems
Note that as described in Chapter 3, "L1 Instruction and Data Cache Operation," great care must be taken when the WIMG bits are changed in the MMU. The following coherency paradoxes can be encountered within a single-processor system: * Load or store to a caching-inhibited page (WIMG = x1xx) and a cache hit occurs. The MPC755 ignores any hits to an L1 cache block in a memory space marked caching-inhibited (WIMG = x1xx). The L1 cache is bypassed and the access is performed externally as if there were no hit. The data in the cache is not pushed, and the cache block is not invalidated. This operation is similar to that of the MPC750 except that in the case of the MPC750, the access is performed to the 60x bus. In the case of the MPC755, the access is performed to the private memory space if private memory is enabled, and if the upper order address bits match the value in L2PM[PMBA]. Alternatively, the access may hit in the L2 cache if it was previously designated as cacheable but the WIMG bits were changed so that the access is cache-inhibited. Although the access may hit in the L2 (if the data was previously loaded when the WIMG bits were set to caching-allowed), the L2 cache does not allocate any new entries for caching-inhibited data. This L2 cache behavior is different than that of the MPC750 for this case. Store to a page marked write-through (WIMG = 1xxx) and a cache hit occurs to a modified cache block. The MPC750 and MPC755 work identically in this case and ignore the modified bit in the cache tag. The cache block is updated during the write-through operation but the block remains in the modified state (M).
*
Note that when WIM bits are changed in the page tables or BAT registers, it is critical that the cache contents reflect the new WIM bit settings. For example, if a block or page that
C-20 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
had allowed caching becomes caching-inhibited, software should ensure that the appropriate cache blocks are flushed to memory and invalidated.
C.5.1.2
dcbz and L1 Cache Coherency
Both the MPC750 and MPC755 processors require protection in the use of the dcbz instruction in order to guarantee cache coherency in a multiprocessor system. Specifically, the dcbz instruction must be: * * Either enveloped by high-level software synchronization protocols (such as semaphores), or Preceded by execution of a dcbf instruction to the same address.
One of these precautions must be taken in order to guarantee that there are no simultaneous cache hits from a dcbz instruction and a snoop to that address. If these two events occur simultaneously, stale data may occur, causing system failures.
C.5.2
Cache Locking
This section describes the cache locking and cache-way locking features of the MPC755.
C.5.2.1
Cache Locking Terminology
Cache locking is the ability to prevent some or all of a microprocessor's instruction or data cache from being overwritten. Cache locking can be set for either an entire cache or for individual ways within the cache as follows: * Entire Cache Locking--When an entire cache is locked, data for read hits within the cache are supplied to the requesting unit in the same manner as hits from an unlocked cache. Similarly, writes that hit in the data cache are written to the cache in the same way as write hits to an unlocked cache. However, any access that misses in the cache is treated as a cache-inhibited access. Cache entries that are invalid at the time of locking remain invalid and inaccessible until the cache is unlocked. When the cache has been unlocked, all entries (including invalid entries) are available. Entire cache locking is inefficient if the number of instructions or the size of data to be locked is small compared to the cache size. Way Locking--Locking only a portion of the cache is accomplished by locking ways within the cache. Locking always begins with the first way (way0) and is sequential, that is, locking ways 0, 1, and 2 is possible, but it is not possible to lock only way0 and way2. When using way locking, at least two ways must be left unlocked. The maximum number of lockable ways is six on the MPC755 embedded processor (way0-way5). Unlike entire cache locking, invalid entries in a locked way are accessible and available for data replacement. As hits to the cache fill invalid entries within a locked way, the entries become valid and locked. This behavior differs from entire cache
Appendix C. MPC755 Embedded G3 Microprocessor C-21
*
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
locking in which invalid entries cannot be allocated. Unlocked ways of the cache behave normally. Table C-5 summaries the MPC755 cache organization.
Table C-5. Cache Organization
Instruction Cache Size 32 Kbyte Data Cache Size 32 Kbyte Associativity 8-way Block Size 8 words Way Size 4 Kbyte
C.5.2.2
Cache Locking Register Summary
Table C-6 through Table C-8 outline the registers and bits used to perform cache locking on the MPC755 embedded processor. Refer to Chapter 2, "Programming Model," for a complete description of the HID0 and MSR registers. Refer to C.4, "The MPC755 Programming Model (Chapter 2)," for a complete description of the HID2 register.
Table C-6. HID0 Bits Used to Perform Cache Locking
Bits 16 17 18 19 20 21 22 25 29 Name ICE DCE ILOCK Description Instruction cache enable. This bit must be set for instruction cache locking. See Section C.5.2.3.1, "Enabling the Data Cache." Data cache enable. This bit must be set for data cache locking. See Section C.5.2.3.1, "Enabling the Data Cache." Instruction cache lock. Set to lock the entire instruction cache. See Section C.5.2.3.14, "Entire Instruction Cache Locking."
DLOCK Data cache lock. Set to lock the entire data cache. See Section C.5.2.3.6, "Entire Data Cache Locking." ICFI DCFI SPD DCFA BHT Instruction cache flash invalidate. Setting and then clearing this bit invalidates the entire instruction cache. See Section C.5.2.3.16, "Invalidating the Instruction Cache (Even if Locked)." Data cache flash invalidate. Setting and then clearing this bit invalidates the entire data cache. See Section C.5.2.3.4, "Invalidating the Data Cache." Speculative cache access disable. This bit must be cleared for instruction cache locking. See Section C.5.2.3.13, "MPC755 Prefetching Considerations." Data cache flush assist. This bit must be set for data cache flushing. See Section C.5.2.3.4, "Invalidating the Data Cache." Branch history table enable. This bit must be cleared for instruction cache locking. See Section C.5.2.3.13, "MPC755 Prefetching Considerations."
Table C-7. HID2 Bits Used to Perform Cache Locking
Bits 16-18 24-26 Name IWLCK DWLCK Description Instruction cache way lock. These bits are used to lock individual ways in the instruction cache. See Section C.5.2.3.15, "Instruction Cache Way Locking." Data cache way lock. These bits are used to lock individual ways in the data cache. See Section C.5.2.3.7, "Data Cache Way Locking."
C-22
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
Table C-8. MSR Bits Used to Perform Cache Locking
Bits 16 19 26 27 Name EE ME IR DR Description External interrupt enable. This bit must be cleared during instruction and data cache loading. See Section C.5.2.3.3, "Disabling Exceptions for Data Cache Locking." Machine check enable. This bit must be cleared during instruction and data cache loading. See Section C.5.2.3.3, "Disabling Exceptions for Data Cache Locking." Instruction address translation. This bit must be set to enable instruction address translation by the MMU. See Section C.5.2.3.2, "Address Translation for Data Cache Locking." Data address translation. This bit must be set to enable data address translation by the MMU. See Section C.5.2.3.2, "Address Translation for Data Cache Locking."
C.5.2.3
Performing Data and Instruction Cache Locking
This section outlines the basic procedures for locking the data and instruction caches and provides some example code for locking the caches. The procedures for the data cache are described first, followed by the corresponding sections for locking the instruction cache. The basic procedures for cache locking are: * * * * * Enabling the cache Enabling address translation for example code Disabling exceptions Loading the cache Locking the cache (entire cache locking or cache way locking)
In addition, this section describes how to invalidate the data and instruction caches, even when they are locked. The following sections describe the procedures for performing data cache locking on the MPC755. C.5.2.3.1 Enabling the Data Cache
To lock the data cache, the data cache enable bit HID0[DCE], bit 17, must be set. The assembly code below enables the data cache:
# Enable the data cache. This corresponds # to setting DCE bit in HID0 (bit 17) mfspr ori sync mtspr r1, HID0 r1, r1, 0x4000 HID0, r1
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-23
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
C.5.2.3.2 * *
Address Translation for Data Cache Locking
Two distinct memory areas must be set up to enable cache locking: The first area is where the code that performs the locking resides and is executed from. The second area is where the data to be locked resides.
Both areas of memory must be in locations that are translated by the memory management unit (MMU). This translation can be performed either with the page table or the block address translation (BAT) registers. For the purposes of the cache locking example in this document, the two areas of memory are defined using the BAT registers. The first area is a 1-Mbyte area in the upper region of memory that contains the code performing the cache locking. The second area is a 256-Mbyte block of memory (not all of the 256-Mbytes of memory is locked in the cache; this area is set up as an example) that contains the data to lock. Both memory areas use identity translation (the logical memory address equals the physical memory address). Table C-9 summarizes the BAT settings used in this example.
Table C-9. Example BAT Settings for Cache Locking
Area First Second
1
Base Address 0xFFF0_0000 0x0000_0000
Memory Size 1 Mbyte 256 Mbyte
WIMG Bits 0b01001 0b0000
BATU Setting 0xFFF0_001F 0x0000_1FFF
BATL Setting 0xFFF0_0002 1 0x0000_0002
Cache-inhibited memory is not a requirement for data cache locking. A setting of 0xFFF0_0002 with a corresponding WIMG of 0b0000 marks the memory area as cacheable.
The block address translation upper (BATU) and block address translation lower (BATL) settings in Table C-9 can be used for both instruction block address translation (IBAT) and data block address translation (DBAT) registers. After the BAT registers have been set up, the MMU must be enabled. The assembly code below enables both instruction and data memory address translation:
# Enable instruction and data memory address translation. This # corresponds to setting IR and DR in the MSR (bits 26 & 27) mfmsr ori mtmsr sync r1 r1, r1, 0x0030 r1
C.5.2.3.3
Disabling Exceptions for Data Cache Locking
To ensure that exception handler routines do not execute while the cache is being loaded (which could possibly pollute the cache with undesired contents) all exceptions must be disabled. This is accomplished by clearing the appropriate bits in the machine state register
C-24
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
(MSR). See Table C-10 for the bits within the MSR that must be cleared to ensure that exceptions are disabled.
Table C-10. MSR Bits for Disabling Exceptions
Bit 16 19 20 23
1
Name EE ME FE0 1 FE11
Description External interrupt enable Machine check enable Floating-point exception mode 0 Floating-point exception mode 1
The floating-point exception may not need to be disabled because the example code shown in this document that performs cache locking does not execute any floating-point operations.
The following assembly code disables all asynchronous exceptions:
# Clear the following bits from the MSR: # EE (16) ME (19) # FE0 (20) FE1 (23) mfmsr lis ori and mtmsr sync r1 r2, 0xFFFF r2, r2, 0x66FF r1, r1, r2 r1
C.5.2.3.4
Invalidating the Data Cache
If a non-empty data cache has modified data, and the data cannot be discarded, the data cache must be flushed before it can be invalidated. Data cache flushing is accomplished by filling the data cache with known data and performing a flash invalidate or a series of dcbf instructions that force a flush and invalidation of the data cache block. The following code sequence shows how to flush the data cache:
# r6 contains a block-aligned address in memory with which to fill # the data cache. For this example, address 0x0 is used li r6, 0x0 # CTR = number of data blocks to load # Number of blocks = (16K) / (32 Bytes/block) # = 2^14 / 2^5 = 2^9 = 0x200 li r1, 0x200 mtctr r1 # Save the total number of blocks in cache to r8 mr r8, r1 # Load the entire cache with known data lwz r2, 0(r6) addi r6, r6, 32 # Find the next block Appendix C. MPC755 Embedded G3 Microprocessor C-25
loop:
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3) bdnz loop # Decrement the counter, and # branch if CTR != 0
# Now, flush the cache with dcbf instructions li r6, 0x0 # Address of first block mtctr r8 # Number of blocks loop2: dcbf r0, r6 addi r6, r6, 32 # Find the next block bdnz loop2 # Decrement the counter, and # branch if CTR != 0
If the content of the data cache does not need to be flushed to memory, the cache can be directly invalidated. The entire data cache is invalidated through the data cache flash invalidate bit HID0[DCFI], bit 21. Setting HID0[DCFI] and then immediately clearing it causes the entire data cache to be invalidated. The following assembly code invalidates the entire data cache (does not flush modified entries):
# Set and then clear the HID0[DCFI] bit, bit 21 mfspr r1, HID0 mr r2, r1 ori r1, r1, 0x0400 mtspr HID0, r1 mtspr HID0, r2 sync
C.5.2.3.5
Loading the Data Cache
This section explains loading data into the data cache. The data cache can be loaded in several ways. The example in this document loads the data from memory. The following assembly code loads the data cache:
# # # # # loop: Assuming interrupts are turned off, cache has been flushed, MMU on, and loading from contiguous cacheable memory. r6 = Starting address of code to lock r20 = Temporary register for loading into CTR = Number of cache blocks to lock lwz addi bdnz r20, 0(r6) r6, r6, 32 loop # Load data into d-cache # Find next block to load # CTR = CTR-1, branch if CTR != 0
C.5.2.3.6
Entire Data Cache Locking
Locking of the entire data cache is controlled by the data cache lock bit (HID0[DLOCK], bit 19). Setting HID0[DLOCK] to 1 locks the entire data cache. To unlock the data, the HID0[DLOCK] must be cleared to 0. Setting the DLOCK bit must be preceded by a sync instruction to prevent the data cache from being locked during a data access. The following assembly code locks the entire data cache:
# Set the DLOCK bit in HID0 (bit 19)
C-26
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3) mfspr ori sync mtspr r1, HID0 r1, r1, 0x1000 HID0, r1
C.5.2.3.7
Data Cache Way Locking
Data cache way locking is controlled by HID2[DWLCK], bits 24-26. Table C-11 shows the HID2[DWLCK 0-2] settings for the MPC755 embedded processor.
Table C-11. MPC755 DWLCK[0-2] Encodings
DWLCK[0-2] 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111 Ways Locked No ways locked Way 0 locked Ways 0 and 1 locked Ways 0, 1, and 2 locked Ways 0, 1, 2, and 3 locked Ways 0, 1, 2, 3, and 4 locked Ways 0, 1, 2, 3, 4, and 5 locked Reserved
The following assembly code locks way0 of the MPC755 data cache:
# Lock way0 of the data cache # This corresponds to setting dwlck(0:2) 0b001 (bits 24-26) mfspr lis ori and ori sync mtspr r1, r2, r2, r1, r1, HID2 0xFFFF r2, 0xFF1F r1, r2 r1, 0x0020
HID2, r1
C.5.2.3.8 *
Invalidating the Data Cache (Even if Locked)
There are two methods to invalidate the instruction or data cache: Invalidate the entire cache by setting and then immediately clearing the data cache flash invalidate bit HIDO[DCFI], bit 21. Even when a cache is locked, toggling DCFI bit invalidates all of the data cache. The data cache block invalidate (dcbi) instruction can be used to invalidate individual cache blocks. The dcbi instruction invalidates blocks locked (either entire or way-locked) within the data cache.
*
The following sections describe the procedures for performing instruction cache locking on the MPC755.
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-27
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
C.5.2.3.9
Enabling the Instruction Cache
To lock the instruction cache, the instruction cache enable bit HID0[ICE], bit 16 must be set.
# Enable the data cache. This corresponds # to setting DCE bit in HID0 (bit 17) mfspr ori sync mtspr r1, HID0 r1, r1, 0x8000 HID0, r1
C.5.2.3.10 Address Translation for Instruction Cache Locking Two distinct memory areas must be set up to enable cache locking: * * The first area is where the code that performs the locking resides and is executed from. The second area is where the instructions to be locked reside.
Both areas of memory must be in locations that are translated by the memory management unit (MMU). This translation can be performed either with the page table or the block address translation (BAT) registers. For the purposes of the cache locking example in this document, two areas of memory are defined using the BAT registers. The first area is a 1-Mbyte area in the upper region of memory that contains the code performing the cache locking. This area of memory must be cache-inhibited for instruction cache locking. The second area is a 256-Mbyte block of memory (not all of the 256-Mbytes of memory is locked in the cache; this area is set up as an example) that contains the instructions to lock. Both memory areas use identity translation (the logical memory address equals the physical memory address). Table C-12 summarizes the BAT settings used in this example.
Table C-12. Example BAT Settings for Cache Locking
Area First Second
1
Base Address 0xFFF0_0000 0x0000_0000
Memory Size 1 Mbyte 256 Mbyte
WIMG Bits 0b01001 0b0000
BATU Setting 0xFFF0_001F 0x0000_1FFF
BATL Setting 0xFFF0_0022 1 0x0000_0002
0xFFF0_0022 defines a cache-inhibited memory area used for instruction cache locking, and corresponds to a WIMG of 0b0100. Cache-inhibited memory is not a requirement for data cache locking. A setting of 0xFFF0_0002 with a corresponding WIMG of 0b0000 marks the memory area as cacheable.
The block address translation upper (BATU) and block address translation lower (BATL) settings in Table C-12 can be used for both instruction block address translation (IBAT) and data block address translation (DBAT) registers. After the BAT registers have been set up, the MMU must be enabled.
C-28
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
The assembly code below enables both instruction and data memory address translation:
# Enable instruction and data memory address translation. This # corresponds to setting IR and DR in the MSR (bits 26 & 27) mfmsr ori mtmsr sync r1 r1, r1, 0x0030 r1
C.5.2.3.11 Disabling Exceptions for Instruction Cache Locking To ensure that exception handler routines do not execute while the cache is being loaded (which could possibly pollute the cache with undesired contents) all exceptions must be disabled. This is accomplished by clearing the appropriate bits in the machine state register (MSR). See Table C-13 for the bits within the MSR that must be cleared to ensure that exceptions are disabled.
Table C-13. MSR Bits for Disabling Exceptions
Bit 16 19 20 23
1
Name EE ME FE0 1 FE11
Description External interrupt enable Machine check enable Floating-point exception mode 0 Floating-point exception mode 1
The floating-point exception may not need to be disabled because the example code shown in this document that performs cache locking does not execute any floating-point operations.
The following assembly code disables all asynchronous exceptions:
# Clear the following bits from the MSR: # EE (16) ME (19) # FE0 (20) FE1 (23) mfmsr lis ori and mtmsr sync r1 r2, 0xFFFF r2, r2, 0x66FF r1, r1, r2 r1
C.5.2.3.12 Preloading Instructions into the Instruction Cache To optimize performance, processors that implement the PowerPC architecture automatically prefetch instructions into the instruction cache. This feature can be used to preload explicit instructions into the cache even when it is known that their execution will be canceled. Although the execution of the instructions is canceled, the instructions remain valid in the instruction cache.
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-29
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
Because instructions are intentionally executed speculatively, care must be taken to ensure that all I/O memory is marked guarded. Otherwise, speculative loads and stores to I/O space could potentially cause data loss. See the Programming Environments Manual for a full discussion of guarded memory. The code that prefetches must be in cache-inhibited memory as in the following example:
# # # # # # # # .orig loop: Assuming exceptions are disabled, cache has been flushed, the MMU is on, and we are executing in a cache-inhibited location in memory LR and r6 = Starting address of code to lock CTR = Number of cache blocks to lock r2 = non-zero numerator and denominator `loop' must begin on an 8-byte boundary to ensure that the divw and beqlr+ are fetched on the same cycle.
0xFFF04000 divw. beqlr+ addi mtlr bdnzr2, r2, r2 # LONG divide w/ non-zero result # Cause the prefetch to happen # # # # Find next block to prefetch set the next block Decrement the counter and branch if CTR != 0
r6, r6, 32 r6 loop
In the above example, both the divw and beqlr+ instructions are fetched at the same time (this assumes a 64-bit 60x data bus; the preloading code does not work for a 32-bit data bus) due to their placement on a double-word boundary. The divide instruction was chosen because it takes many cycles to execute. During execution of the divide, the processor starts fetching instructions speculatively at the target destination of the branch instruction. The speculation occurs because the branch is statically predicted as taken. This speculative fetching causes the cache block that is pointed to by the link register (LR) to be loaded into the cache. Because the divw. instruction always produces a non-zero result, the beqlr+ is not taken and execution of all speculatively fetched instructions is canceled. However, the instructions remain valid in the cache. If the destination instruction stream contains an unconditional branch to another memory location, it is possible to also prefetch the destination of the unconditional branch instruction. This does not cause a problem if the destination of the unconditional branch is also inside the area of memory that needs to be preloaded. But if the destination of the unconditional branch is not in the area of memory to be loaded, then care must be taken to ensure that the branch destination is to an area of memory that is cache inhibited. Otherwise, unintentional instructions may be locked in the cache and the desired instructions may not be in their expected way within the cache.
C-30
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L1 Instruction and Data Cache Operation (Chapter 3)
C.5.2.3.13 MPC755 Prefetching Considerations Because the instruction cache preloading code relies on static branch prediction to ensure that the beqlr+ instruction is predicted as taken, speculative cache access must be enabled. Speculative cache access is controlled by the speculative cache access disable bit HID0[SPD], bit 22. This bit must be cleared to ensure that instructions can be speculatively loaded from the instruction cache. Also, the instruction cache preloading code will not work when dynamic branch prediction is enabled. To ensure that MPC755 dynamic branch prediction is disabled, the branch history table bit HID0[BHT], bit 29, must be cleared. By default, the BHT is cleared out of reset. C.5.2.3.14 Entire Instruction Cache Locking Locking the entire instruction cache is controlled by the instruction cache lock bit (HID0[ILOCK], bit 18). Setting HID0[ILOCK] locks the entire instruction cache, and clearing HID0[ILOCK] allows the instruction cache to operate normally. The setting of the HID0[ILOCK] should be preceded by an isync instruction to prevent the instruction cache from being locked during an instruction access. The following assembly code locks the contents of the entire instruction cache.
# Set the ILOCK bit in HID0 (bit 18) mfspr ori isync mtspr r1, HID0 r1, r1, 0x2000 HID0, r1
C.5.2.3.15 Instruction Cache Way Locking Instruction cache way locking is controlled by the HID2[IWLCK], bits 16-18. Table C-14 shows the HID2[IWLCK 0-2] settings for the MPC755 embedded processor.
Table C-14. MPC755 IWLCK[0-2] Encodings
IWLCK [0-2] 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111 Ways Locked No ways locked Way 0 locked Ways 0 and 1 locked Ways 0, 1, and 2 locked Ways 0, 1, 2, and 3 locked Ways 0, 1, 2, 3, and 4 locked Ways 0, 1, 2, 3, 4, and 5 locked Reserved
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-31
MPC755 Exceptions (Chapter 4)
The assembly code below locks way0 of the MPC755 instruction cache:
# Lock way0 of the instruction cache # This corresponds to setting iwlck(0:2) to 0b001 (bits 16-18) mfspr lis ori and ori isync mtspr r1, r2, r2, r1, r1, HID2 0xFFFF r2, 0x1FFF r1, r2 r1, 0x2000
HID2, r1
C.5.2.3.16 Invalidating the Instruction Cache (Even if Locked) There are two methods to invalidate the instruction cache. In the first way, invalidate the entire cache by setting and then immediately clearing the instruction cache flash invalidate bit (HID0[ICFI], bit 20). Even when a cache is locked, toggling the ICFI bit invalidates all of the instruction cache. The following assembly code invalidates the entire instruction cache:
# Set and then clear the HIDO[ICFI] bit, bit 20 mfspr mr ori mtspr mtspr sync r1, HID0 r2, r1 r1, r1, 0x0800 HID0, r1 HID0, r2
In the second method, the instruction cache block invalidate (icbi) instruction can be used to invalidate individual cache blocks. The icbi instruction invalidates blocks in an entirely locked instruction cache for the MPC750 and the MPC755 microprocessors. On the MPC755 embedded processor, the icbi instruction invalidates way-locked blocks within the instruction cache.
C.6
MPC755 Exceptions (Chapter 4)
The exception model for the MPC755 is the same as that described in Chapter 4, "Exceptions," except as described in this section. For both the MPC750 and MPC755, no combination of the thermal assist unit, the decrementer register, and the performance monitor can be used at any one time. If exceptions for any two of these functional blocks are enabled together, multiple exceptions caused by any of these three blocks cause unpredictable results. The MPC755 has three new exceptions used to support software table search operations (the same as the MPC603e). Software table searching is enabled with the setting of HID2[SWT_EN], bit 12. When this bit is cleared, the MPC755 uses the hardware table
C-32 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC755 Exceptions (Chapter 4)
searching mechanism of the MPC750 when a miss occurs in an on-chip TLB. When HID2[SWT_EN] = 1, software table searching is enabled and a TLB miss causes one of the exceptions described in this section. See Section C.4.3, "tlbld and tlbli Instructions," for a detailed explanation of the tlbli and tlbld instructions used to load the TLBs. See C.7, "MPC755 Memory Management (Chapter 5)," for a more detailed explanation of the other resources used to perform table search operations in software and example exception handlers. The three MMU exceptions used for software table search operations are described in Table C-15.
Table C-15. Software Table Search Exceptions and Conditions
Exception Type Instruction TLB miss Data TLB miss for load Data TLB miss for store Vector Offset (hex) 01000 01100 01200 Causing Conditions An instruction TLB miss exception is caused when an effective address for an instruction fetch cannot be translated by the ITLB. A data TLB miss for load exception is caused when an effective address for a data load operation cannot be translated by the DTLB. A data TLB miss for store exception is caused when an effective address for a data store operation cannot be translated by the DTLB, or where a DTLB hit occurs, and the change bit in the PTE must be set due to a data store operation.
The SRR0, SRR1, and MSR registers are used by the MPC755 when an exception occurs. Register settings for the instruction and data TLB miss exceptions are described in Table C-16.
Table C-16. Instruction and Data TLB Miss Exceptions--Register Settings
Register SRR0 SRR1 Setting Description Set to the address of the next instruction to be executed in the program for which the TLB miss exception was generated. Loaded from condition register CR0 field Cleared KEY. Key for TLB miss (either Ks or Kp from segment register, depending on whether the access is a user or supervisor access). See Figure C-5. 13 D/I. Data or instruction access 0 = data TLB miss 1 = instruction TLB miss 14 WAY. Next TLB set to be replaced (set per LRU) 0 = replace TLB associativity set 0 1 = replace TLB associativity set 1 15 S/L. Store or load data access 0 = data TLB miss on load 1 = data TLB miss on store (or C = 0) 16-31 Loaded from bits 16-31 of the MSR POW 0 ILE -- IP -- EE PR FP ME 0 0 0 -- FE0 SE BE FE1 0 0 0 0 IR DR RI LE 0 0 0 Set to value of ILE 0-3 4-11 12
MSR 1
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-33
MPC755 Exceptions (Chapter 4)
1
MSR[14] (the TGPR bit) of the MPC603e processor provided control for a separate set of four temporary GPRs that could be used as general-purpose registers in the TLB miss exception handler routines. MSR[14] is reserved on the MPC755, and the new SPRG[4-7] can be used for the TLB miss handler code.
The MPC755 automatically saves the values of CR[CR0] of the executing context to SRR1[0-3]. Thus, the exception handler can set CR[CR0] bits and branch accordingly in the exception handler routine, without having to save the existing CR[CR0] bits. However, the exception handler must restore these bits to CR[CR0] before executing the rfi instruction. Also saved in SRR1 are two bits identifying the type of miss (SRR1[D/I] identifies instruction or data, and SRR1[S/L] identifies a store or load). Additionally, SRR1[WAY] identifies the associativity class of the TLB entry selected for replacement by the LRU algorithm. The software can change this value, effectively overriding the replacement algorithm. Finally, the SRR1[KEY] bit is used by the table search software to determine if there is a protection violation associated with the access (useful on data write misses for determining if the C bit should be updated in the table). The key bit, saved in SRR1 for a TLB miss exception, is derived as shown in Figure C-5.
Select KEY from segment register: If MSR[PR] = 0, KEY = Ks If MSR[PR] = 1, KEY = Kp
Figure C-5. Derivation of Key Bit for SRR1
C.6.1
Instruction TLB Miss Exception (0x01000)
When the effective address for an instruction fetch operation cannot be translated by the ITLBs or IBATs, an instruction TLB miss exception is generated. Register settings for the instruction and data TLB miss exceptions are described in Table C-16. If the instruction TLB miss exception handler fails to find the desired PTE, then a page fault must be synthesized. The handler must restore the machine state before invoking the ISI exception (0x00400). When an instruction TLB miss exception is taken, instruction execution for the handler begins at offset 0x01000 from the physical base address indicated by MSR[IP].
C.6.2
Data TLB Miss for Load Exception (0x01100)
When the effective address for a data load or cache operation cannot be translated by the DTLBs or DBATs, a data TLB miss for load exception is generated. Register settings for the instruction and data TLB miss exceptions are described in Table C-16. If the data TLB miss exception handler fails to find the desired PTE, then a page fault must be synthesized. The handler must restore the machine state before invoking the DSI exception (0x00300).
C-34
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5)
When a data TLB miss for load exception is taken, instruction execution for the handler begins at offset 0x01100 from the physical base address indicated by MSR[IP].
C.6.3
Data TLB Miss for Store Exception (0x01200)
When the effective address for a data store or cache operation cannot be translated by the DTLBs or DBATs, a data TLB miss for store exception is generated. The data TLB miss for store exception is also taken when the changed bit (C = 0) for a DTLB entry needs to be updated for a store operation. Register settings for the instruction and data TLB miss exceptions are described in Table C-16. If the data TLB miss exception handler fails to find the desired PTE, then a page fault must be synthesized. The handler must restore the machine state before invoking the DSI exception (0x00300). When a data TLB miss for store exception is taken, instruction execution for the handler begins at offset 0x01200 from the physical base address indicated by MSR[IP].
C.7
MPC755 Memory Management (Chapter 5)
The MPC755 implements a virtual memory management scheme that is compliant with the PowerPC architecture for 32-bit microprocessors and that implements the software table searching features of the MPC603e. The organization of the memory management unit (MMU) hardware is as follows: * Same as MPC750 -- 128-entry, two-way set associative data TLB -- 128-entry, two-way set associative instruction TLB -- Sixteen segment registers -- Automatic hardware table search operations New features in the MPC755 -- 4- or 8-entry (HID2-selectable), fully associative instruction BAT array -- 4- or 8-entry (HID2-selectable), fully associative data BAT array -- Selectable software table search functionality by setting HID2[SWT_EN], bit 12.
*
The MPC755 has a set of implementation-specific registers, exceptions, and instructions that facilitate very efficient software searching of the page tables in memory. This section describes those resources and provides three example code sequences that can be used in an MPC755 system for an efficient search of the translation tables in software. These three code sequences can be used as handlers for the three exceptions requiring access to the PTEs in the page tables in memory--instruction TLB miss, data TLB miss on load, and data TLB miss on store exceptions.
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-35
MPC755 Memory Management (Chapter 5)
Note that the remainder of the MMU definition and rules about updating the page tables in memory for the MPC755 are the same as that for the MPC750.
C.7.1
Software Table Search Resources
In addition to setting up the translation page tables in memory, the system software must assist the processor in loading PTEs into the on-chip TLBs. When a required TLB entry is not found in the appropriate TLB, the processor vectors to one of the three TLB miss exception handlers so that the software can perform a table search operation and load the TLB. When this occurs, the processor automatically saves information about the access and the executing context. Table C-17 provides a summary of the implementation-specific exceptions, registers, and instructions that can be used by the TLB miss exception handler software in MPC755 systems. See Section C.4.3, "tlbld and tlbli Instructions," for detailed information about the operation of the tlbli and tlbld instructions and C.6, "MPC755 Exceptions (Chapter 4)," for more information about exception processing on the MPC755.
Table C-17. Implementation-Specific Resources for Software Table Search Operations--Summary
Resource Exceptions Name Instruction TLB miss exception (vector offset 0x1000) Data TLB miss for load exception (vector offset 0x1100) Data TLB miss for store exception--also caused when changed bit must be updated (vector offset 0x1200) Registers IMISS and DMISS Description No matching entry found in ITLB No matching entry found in DTLB for a load data access No matching entry found in DTLB for a store data access or matching DLTB entry has C = 0 and access is a store
When a TLB miss exception occurs, the IMISS or DMISS register contains the 32-bit effective address of the instruction or data access that caused the miss exception. The ICMP and DCMP registers contain the word to be compared with the first word of a PTE in the table search software routine to determine if a PTE contains the address translation for the instruction or data access. The contents of ICMP and DCMP are automatically derived by the MPC755 when a TLB miss exception occurs. The HASH1 and HASH2 registers contain the primary and secondary PTEG addresses that correspond to the address causing a TLB miss. These PTEG addresses are automatically derived by the MPC755 by performing the primary and secondary hashing function on the contents of IMISS or DMISS, for an ITLB or DTLB miss exception, respectively. The system software loads a TLB entry by loading the second word of the matching PTE entry into the RPA register and then executing the tlbli or tlbld instruction (for loading the ITLB or DTLB, respectively).
ICMP and DCMP
HASH1 and HASH2
RPA
C-36
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5)
Table C-17. Implementation-Specific Resources for Software Table Search Operations--Summary (continued)
Resource Instructions tlbli rB tlbld rB Name Description Loads the contents of the ICMP and RPA registers into the ITLB entry selected by and SRR1[WAY] Loads the contents of the DCMP and RPA registers into the DTLB entry selected by and SRR1[WAY]
In addition, the MPC755 contains four additional SPRG registers SPRG[4-7] that have been implemented to save and restore general-purpose registers used by the exception handler. This is a replacement for having the MSR[TGPR] bit of the MPC603e and four temporary general-purpose registers. Note that the MSR[TGPR] bit is not implemented in the MPC755. If software table searching is not enabled, then these registers may be used for any supervisor purpose.
C.7.2
Software Table Search Registers
This section describes the format of the implementation-specific SPRs that are not defined by the PowerPC architecture, but are used by the TLB miss exception handlers. These registers can be accessed by supervisor-level instructions only. Any attempt to access these SPRs with user-level instructions results in a privileged instruction program exception. Because DMISS, IMISS, DCMP, ICMP, HASH1, HASH2, and RPA are used to access the translation tables for software table search operations, they should only be accessed when address translation is disabled (that is, MSR[IR] = 0 and MSR[DR] = 0). Note that MSR[IR] and MSR[DR] are cleared by the processor whenever an exception occurs.
C.7.2.1
Data and Instruction TLB Miss Address Registers (DMISS and IMISS)
The DMISS and IMISS registers have the same format as shown in Figure C-6. They are loaded automatically upon a data or instruction TLB miss. The DMISS and IMISS contain the effective page address of the access that caused the TLB miss exception. The contents are used by the processor when calculating the values of HASH1 and HASH2, and by the tlbld and tlbli instructions when loading a new TLB entry. Note that the MPC755 always loads a big-endian address into the DMISS register. These registers are read-only to the software.
Effective Page Address
0 31
Figure C-6. DMISS and IMISS Registers
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-37
MPC755 Memory Management (Chapter 5)
C.7.2.2
Data and Instruction TLB Compare Registers (DCMP and ICMP)
The DCMP and ICMP registers are shown in Figure C-7. These registers contain the first word in the required PTE. The contents are constructed automatically from the contents of the segment registers and the effective address (DMISS or IMISS) when a TLB miss exception occurs. Each PTE read from the tables in memory during the table search process should be compared with this value to determine whether or not the PTE is a match. Upon execution of a tlbld or tlbli instruction, the contents of the DCMP or ICMP register are loaded into the first word of the selected TLB entry.
V
01
VSID
H
24 25 26
API
31
Figure C-7. DCMP and ICMP Registers
Table C-18 describes the bit settings for the DCMP and ICMP registers.
Table C-18. DCMP and ICMP Bit Settings
Bit 0 1-24 25 26-31 V VSID H API Name Description Valid bit. Set by the processor on a TLB miss exception. Virtual segment ID. Copied from VSID field of corresponding segment register. Hash function identifier. Cleared by the processor on a TLB miss exception. Abbreviated page index. Copied from API of effective address.
C.7.2.3
Primary and Secondary Hash Address Registers (HASH1 and HASH2)
HASH1 and HASH2 contain the physical addresses of the primary and secondary PTEGs for the access that caused the TLB miss exception. Only bits 7-25 differ between them. For convenience, the processor automatically constructs the full physical address by routing bits 0-6 of SDR1 into HASH1 and HASH2 and clearing the lower six bits. These registers are read-only and are constructed from the contents of the DMISS or IMISS register. The format for the HASH1 and HASH2 registers is shown in Figure C-8.
Reserved HTABORG
0 6 7
Hashed Page Address
25 26
000000
31
Figure C-8. HASH1 and HASH2 Registers
Table C-19 describes the bit settings of the HASH1 and HASH2 registers.
C-38
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5)
Table C-19. HASH1 and HASH2 Bit Settings
Bit 0-6 7-25 26-31 Name HTABORG[0-6] Hashed page address -- Description Copy of the upper 7 bits of the HTABORG field from SDR1 Address bits 7-25 of the PTEG to be searched. Reserved
C.7.2.4
Required Physical Address Register (RPA)
The RPA is shown in Figure C-9. During a page table search operation, the software must load the RPA with the second word of the correct PTE. When the tlbld or tlbli instruction is executed, data from the IMISS and ICMP (or DMISS and DCMP) and the RPA registers is merged and loaded into the selected TLB entry. The TLB entry is selected by the effective address of the access (loaded by the table search software from the DMISS or IMISS register) and the SRR1[WAY] bit.
Reserved RPN
0 19 20
000
RC
WIMG
0
PP
22 23 24 25
28 29 30 31
Figure C-9. Required Physical Address (RPA) Register
Table C-20 describes the bit settings of the RPA register.
Table C-20. RPA Bit Settings
Bit 0-19 20-22 23 24 25-28 29 30-31 Name RPN -- R C WIMG -- PP Description Physical page number from PTE Reserved Referenced bit from PTE Changed bit from PTE Memory/cache access attribute bits Reserved Page protection bits from PTE
C.7.3
Software Table Search Operation
When a TLB miss occurs and software table searching is enabled, the instruction or data MMU loads the IMISS or DMISS register, respectively, with the effective address of the access. The processor completes all instructions dispatched prior to the exception, status information is saved in SRR1, and one of the three TLB miss exceptions is taken. In addition, the processor loads the ICMP or DCMP register with the value to be compared with the first word of PTEs in the tables in memory.
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-39
MPC755 Memory Management (Chapter 5)
The software should then access the first PTE at the address pointed to by HASH1. The first word of the PTE should be loaded and compared to the contents of DCMP or ICMP. If there is a match, then the required PTE has been found and the second word of the PTE is loaded from memory into the RPA register. Then the tlbli or tlbld instruction is executed, which loads the contents of the ICMP (or DCMP) and RPA registers into the selected TLB entry. The TLB entry is selected by the effective address of the access and the SRR1[WAY] bit. If the compare did not result in a match, however, the PTEG address is incremented to point to the next PTE in the table and the above sequence is repeated. If none of the eight PTEs in the primary PTEG matches, the sequence is then repeated using the secondary PTEG (at the address contained in HASH2). If the PTE is also not found in the eight entries of the secondary page table, a page fault condition exists, and a page fault exception must be synthesized. Thus the appropriate bits must be set in SRR1 (or DSISR) and the TLB miss handler must branch to either the ISI or DSI exception handler, which handles the page fault condition.
C.7.3.1
Flow for Example Exception Handlers
This section provides a flow diagram outlining some example software that can be used to handle the three TLB miss exceptions. Figure C-10 shows the flow for the example TLB miss exception handlers. The flow shown is common for the three exception handlers, except that the IMISS and ICMP registers are used for the instruction TLB miss exception while the DMISS and DCMP registers are used for the two data TLB miss exceptions. Also, for the cases of store instructions that cause either a TLB miss or require a table search operation to update the C bit, the flow shows that the C bit is set in both the TLB entry and the PTE in memory. Finally, in the case of a page fault (no PTE found in the table search operation), the setup for the ISI or DSI exception is slightly different for these two cases. Figure C-11 shows the flow for checking the R and C bits and setting them appropriately. Figure C-12 shows the flow for synthesizing a page fault exception when no PTE is found. Figure C-13 shows the flow for managing the cases of a TLB miss on an instruction access to guarded memory, and a TLB miss when C = 0 and a protection violation exists. The set up for these protection violation exceptions is very similar to that of page fault conditions (as shown in Figure C-12) except that different bits in SRR1 (and DSISR) are set.
C-40
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5)
TLB Miss Exception Save old counter, CR0 bits and 4 gprs Set Counter: cnt 8 Load Primary PTEG Pointer: ptr HASH1 - 8 compare_value ICMP/DCMP
Read Lower Word of Next PTE from Memory: ptr ptr + 8 temp (ptr)
cnt cnt-1
temp = compare_value
otherwise cnt 0 otherwise
Read Upper Word of PTE: temp (ptr - 4) compare_value [H] = 1 RPA temp Secondary Hash Complete Set Up for Page Fault Exception (See Figure C-12) otherwise IMISS/DMISS Check R, C Bits and Set as Needed (See Figure C-11) Load TLB Entry tlbli (or tlbld) Restore Old Counter and CR0 bits Return to Executing Program: rfi Set Up for Protection Violation Exception (See Figure C-13) compare_value [H] 1 Set Counter: cnt 8 Load Secondary PTEG Pointer: ptr HASH2 - 8 otherwise
instruction Access and temp[G] = 1
Figure C-10. Flow for Example Software Table Search Operation
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-41
MPC755 Memory Management (Chapter 5)
Check R, C Bits and Set as Needed
Handler for Data Store Op otherwise temp[C] = 0 Check Protection pp = 10 pp = 10 11 pp = 00 01 otherwise Set R Bit: temp temp OR 0x100 Store Byte 7 of PTE to Memory: (ptr - 2) temp [byte7]
pp = 11
Set Up for Protection Violation (See Figure C-13) SRR1[KEY] = 1
Return to TLB Miss Exception Flow (See Figure C-10)
otherwise
Set Up for Protection Violation (See Figure C-13)
Set R, C bits: temp temp OR 0x180 Store Bytes 6, 7 of PTE to Memory: (ptr - 2) temp [Bytes 6, 7] Return to TLB Miss Exception Flow (See Figure C-10)
Figure C-11. Check and Set R and C Bit Flow
C-42
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5)
Set up for Page Fault Exception
Data TLB Miss Handlers DSISR[6] SRR1[15] Clear Upper Bits of SRR1 SRR1 SRR1 AND 0xFFFF DSISR[1] 1 dtemp DMISS
Instruction TLB Miss Handlers
Clear Upper Bits of SRR1 SRR1 SRR1 AND 0xFFFF SRR1[1] 1 Restore CR0 Bits and gprs Branch to ISI Exception Handler
SRR1[31] = 1 (Little-Endian Mode) otherwise dtemp dtemp XOR 0x07
DAR dtemp Restore CR0 Bits and gprs Branch to DSI Exception Handler
Figure C-12. Page Fault Setup Flow
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-43
MPC755 Memory Management (Chapter 5)
Set up for Protection Violation Exceptions
Data TLB Miss Handlers (Data Access to Protected Memory; C=0) DSISR[6] SRR1[15] Clear Upper Bits of SRR1 SRR1 SRR1 AND 0xFFFF DSISR[4] 1 dtemp DMISS
Instruction TLB Miss Handler
(Instruction Access to Guarded Memory)
Clear Upper Bits of SRR1 SRR1 SRR1 AND 0xFFFF SRR1[4] 1 Restore CR0 Bits and gprs Branch to ISI Exception Handler
SRR1[31] = 1 (Little-Endian Mode) otherwise dtemp dtemp XOR 0x07
DAR dtemp Restore CR0 Bits and gprs Branch to DSI Exception Handler
Figure C-13. Setup for Protection Violation Exceptions
C.7.3.2
Code for Example Exception Handlers
This section provides some assembly language examples that implement the flow diagrams described above. Note that although these routines fit into a few cache lines, they are supplied only as a functional example; they could be further optimized for faster performance.
# TLB software load for MPC755 # # New Instructions: # tlbld - write the dtlb with the pte in rpa reg # tlbli - write the itlb with the pte in rpa reg # New SPRs # dmiss - address of dstream miss # imiss - address of istream miss # hash1 - address primary hash PTEG address
C-44
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5) # hash2 - returns secondary hash PTEG address # iCmp - returns the primary istream compare value # dCmp - returns the primary dstream compare value # rpa - the second word of pte used by tlblx # # # there are three flows. # tlbDataMiss - tlb miss on data load # tlbCeq0 - tlb miss on data store or store with tlb change bit == 0 # tlbInstrMiss - tlb miss on instruction fetch #+ # place labels for rel branches ##.machine PPC_755 # gpr r0..r3 are saved into SPRG4-7 .set r0, 0 .set r1, 1 .set r2, 2 .set r3, 3 .set dMiss, 1010 .set dCmp, 1011 .set hash1, 1012 .set hash2, 1013 .set iMiss, 1014 .set iCmp, 1015 .set rpa, 1010 .set c0, 0 .set dar, 19 .set dsisr, 18 .set srr0, 26 .set srr1, 27 .set sprg4, 276 .set sprg5, 277 .set sprg6, 278 .set sprg7, 279 . .csect tlbmiss[PR] vec0: .globl vec0 .org vec0+0x300 vec300: .org vec0+0x400 vec400: #+ # Instruction TB miss flow # Entry: # Vec = 1000 # srr0 -> address of instruction that missed # srr1 -> 0:3=cr0 4=lru way bit 16:31 = saved MSR # iMiss -> ea that missed # iCmp -> the compare value for the va that missed # hash1 -> pointer to first hash pteg # hash2 -> pointer to second hash pteg
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-45
MPC755 Memory Management (Chapter 5) # # Register usage: #Existing values of r0-r3 saved into sprg4-sprg7 #r0-r3 used in the exception handler as follows # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value .org vec0+0x1000
tlbInstrMiss: mtspr sprg4, r0 # save r0 into sprg4 mtspr sprg5, r1 # save r1 into sprg5 mtspr sprg6, r2 # save r2 into sprg6 mtspr sprg7, r3 # save r3 into sprg7 mfspr r2, hash1 # get first pointer addi r1, 0, 8 # load 8 for counter mfctr r0 # save counter mfspr r3, iCmp # get first compare value addi r2, r2, -8 # pre dec the pointer im0: mtctr r1 # load counter im1: lwzu r1, 8(r2) # get next pte cmp c0, r1, r3 # see if found pte bdneq im1 # dec count br if cmp ne and if count not zero bne instrSecHash # if not found set up second hash or exit l r1, +4(r2) # load tlb entry lower-word andi. r3, r1, 8 # check G-bit bne doISIp # if guarded, take an ISI mtctr r0 # restore counter mfspr r0, iMiss # get the miss address for the tlbli mfspr r3, srr1 # get the saved cr0 bits mtcrf 0x80, r3 # restore CR0 mtspr rpa, r1 # set the pte ori r1, r1, 0x100 # set reference bit srw r1, r1, 8 # get byte 7 of pte tlbli r0 # load the itlb stb r1, +6(r2) # update page table mfspr r0, sprg4 # restore old value of r0 mfspr r1, sprg5 # restore old value of r1 mfspr r2, sprg6 # restore old value of r2 mfspr r3, sprg7 # restore old value of r3 rfi # return to executing program #+ # Register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #instrSecHash: andi. r1, r3, 0x0040 # see if we have done second hash bne doISI # if so, go to ISI exception mfspr r2, hash2 # get the second pointer
C-46
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5) ori addi addi b r3, r3, 0x0040 r1, 0, 8 r2, r2, -8 im0 # # # # change the compare value load 8 for counter pre dec for update on load try second hash
#+ # entry Not Found: synthesize an ISI exception # guarded memory protection violation: synthesize an ISI exception # Entry: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value # doISIp: mfspr r3, srr1 # get srr1 andi. r2,r3,0xFFFF # clean upper srr1 addis r2, r2, 0x0800 # or in srr<4> = 1 to flag prot violation b isi1: doISI: mfspr r3, srr1 # get srr1 andi. r2, r3, 0xFFFF # clean srr1 addis r2, r2, 0x4000 # or in srr1<1> = 1 to flag pte not found isi1 mtctr r0 # restore counter mtspr srr1, r2 # set srr1 mtcrf 0x80, r3 # restore CR0 mfspr r0, sprg4 # restore old value of r0 mfspr r1, sprg5 # restore old value of r1 mfspr r2, sprg6 # restore old value of r2 mfspr r3, sprg7 # restore old value of r3 b vec400 # go to instr. access exception # #+ # Data TLB miss flow # Entry: # Vec = 1100 # srr0 -> address of instruction that caused data tlb miss # srr1 -> 0:3=cr0 4=lru way bit 5=1 if store 16:31 = saved MSR # dMiss -> ea that missed # dCmp -> the compare value for the va that missed # hash1 -> pointer to first hash pteg # hash2 -> pointer to second hash pteg # # Register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #.csect tlbmiss[PR] .org vec0+0x1100 tlbDataMiss: mtspr mtspr
sprg4, r0 sprg5, r1
# save r0 into sprg4 # save r1 into sprg5
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-47
MPC755 Memory Management (Chapter 5) mtspr mtspr mfspr addi mfctr mfspr addi mtctr lwzu cmp bdnzf bne l mtctr mfspr mfspr mtcrf mtspr ori srw tlbld stb mfspr mfspr mfspr mfspr rfi sprg6, r2 sprg7, r3 r2, hash1 r1, 0, 8 r0 r3, dCmp r2, r2, -8 r1 r1, 8(r2) c0, r1, r3 0, dm1 dataSecHash r1, +4(r2) r0 r0, dMiss r3, srr1 0x80, r3 rpa, r1 r1, r1, 0x100 r1, r1, 8 r0 r1, +6(r2) r0, sprg4 r1, sprg5 r2, sprg6 r3, sprg7 # # # # # # # # # # # # # # # # # # # # # # # # # # # save r2 into sprg6 save r3 into sprg7 get first pointer load 8 for counter save counter get first compare value pre dec the pointer load counter get next pte see if found pte dec count br if cmp ne and if count not zero if not found set up second hash or exit load tlb entry lower-word restore counter get the miss address for the tlbld get the saved cr0 bits restore CR0 set the pte set reference bit get byte 7 of pte load the dtlb update page table restore old value of r0 restore old value of r1 restore old value of r2 restore old value of r3 return to executing program
dm0: dm1:
#+ # Register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #dataSecHash: andi. r1, r3, 0x0040 # see if we have done second hash bne doDSI # if so, go to DSI exception mfspr r2, hash2 # get the second pointer ori r3, r3, 0x0040 # change the compare value addi r1, 0, 8 # load 8 for counter addi r2, r2, -8 # pre dec for update on load b dm0 # try second hash # #+ # C=0 in dtlb and dtlb miss on store flow # Entry: # Vec = 1200 # srr0 -> address of store that caused the exception # srr1 -> 0:3=cr0 4=lru way bit 5=1 16:31 = saved MSR # dMiss -> ea that missed # dCmp -> the compare value for the va that missed # hash1 -> pointer to first hash pteg # hash2 -> pointer to second hash pteg
C-48
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Memory Management (Chapter 5) # # Register # r0 # r1 # r2 # r3 #.csect .org
usage: is saved counter is junk is pointer to pteg is current compare value
tlbmiss[PR] vec0+0x1200
tlbCeq0: mtspr sprg4, r0 # save r0 into sprg4 mtspr sprg5, r1 # save r1 into sprg5 mtspr sprg6, r2 # save r2 into sprg6 mtspr sprg7, r3 # save r3 into sprg7 mfspr r2, hash1 # get first pointer addi r1, 0, 8 # load 8 for counter mfctr r0 # save counter mfspr r3, dCmp # get first compare value addi r2, r2, -8 # pre dec the pointer ceq0: mtctr r1 # load counter ceq1: lwzu r1, 8(r2) # get next pte cmp c0, r1, r3 # see if found pte bdneq ceq1 # dec count br if cmp ne and if count not zero bne cEq0SecHash # if not found set up second hash or exit l r1, +4(r2) # load tlb entry lower-word andi. r3,r1,0x80 # check the C-bit beq cEq0ChkProt # if (C==0) go check protection modes ceq2: mtctr r0 # restore counter mfspr r0, dMiss # get the miss address for the tlbld mfspr r3, srr1 # get the saved cr0 bits mtcrf 0x80, r3 # restore CR0 mtspr rpa, r1 # set the pte tlbld r0 # load the dtlb mfspr r0, sprg4 # restore old value of r0 mfspr r1, sprg5 # restore old value of r1 mfspr r2, sprg6 # restore old value of r2 mfspr r3, sprg7 # restore old value of r3 rfi # return to executing program #+ # Register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #cEq0SecHash: andi. r1, r3, 0x0040 # see if we have done second hash bne doDSI # if so, go to DSI exception mfspr r2, hash2 # get the second pointer ori r3, r3, 0x0040 # change the compare value addi r1, 0, 8 # load 8 for counter addi r2, r2, -8 # pre dec for update on load
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-49
MPC755 Memory Management (Chapter 5) b ceq0 # try second hash #+ # entry found and PTE(c-bit==0): # (check protection before setting PTE(c-bit) # Register usage: # r0 is saved counter # r1 is PTE entry # r2 is pointer to pteg # r3 is trashed #cEq0ChkProt: rlwinm. r3,r1,30,0,1 # test PP bgechk0 # if (PP==00 or PP==01) goto chk0: andi. r3,r1,1 # test PP[0] beq+ chk2 # return if PP[0]==0 b doDSIp # else DSIp chk0: mfspr andis. beq b ori sth b r3,srr1 r3,r3,0x0008 chk2 doDSIp r1, r1, 0x180 r1, 6(r2) ceq2 # # # # # # # get old msr test the KEY bit (SRR0-bit 12) if (KEY==0) goto chk2: else DSIp set reference and change bit update page table and back we go
chk2:
# #+ # entry Not Found: synthesize a DSI exception # Entry: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value # doDSI: mfspr rlwinm addis b doDSIp: mfspr rlwinm rest dsi1: addis mtctr andi. mtspr mtspr mfspr rlwinm. beq xor mtspr mtcrf r1, r1, 0x0800 r0 r2, r3, 0xFFFF srr1, r2 dsisr, r1 r1, dMiss r2,r2,0,31,31 dsi2: r1,r1,0x07 dar, r1 0x80, r3 # # # # # # # # # # # or in dsisr<4> = restore counter clear upper bits set srr1 load the dsisr get miss address test LE bit if little endian de-mung the data put in dar restore CR0 1 to flag prot violation of srr1 r3, srr1 r1, r3,9,6,6 # get srr1 # get srr1 to bit 6 for load/store, zero r3, srr1 # get srr1 r1,r3,9,6,6# get srr1 to bit 6 for load/store, zero rest r1, r1, 0x4000 # or in dsisr<1> = 1 to flag pte not found dsi1:
then: address
dsi2:
C-50
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 Instruction Timing (Chapter 6) mfspr mfspr mfspr mfspr b r0, sprg4 r1, sprg5 r2, sprg6 r3, sprg7 vec300 # # # # # restore old value of r0 restore old value of r1 restore old value of r2 restore old value of r3 branch to DSI exception
C.8
MPC755 Instruction Timing (Chapter 6)
The instruction timing of the MPC755 is identical to that of the MPC750 except for addition of the new tlbli and tlbld instructions. Table C-21 provides latencies for the new tlbli and tlbld instructions.
Table C-21. TLB Load and Store Instruction Latencies
Primary Opcode 31 31 Extended Opcode 978 1010 Mnemonic tlbld tlbli Execution Unit LSU LSU Clock Cycles 2& 3&
Note: Cycle times marked with "&" require a variable number of cycles due to serialization.
C.9
MPC755 Signal Descriptions (Chapter 7)
This section describes two new signals that select the I/O voltages for the system bus (BVSEL) and the L2 interface (L2VSEL) as described in Table C-22. Refer to the MPC755 Hardware Specification for more detailed information about these signals. All other MPC755 signals operate the same as the MPC750 signals.
Table C-22. Voltage-Select Signal Descriptions
Signal BVSEL L2VSEL Comments BVSEL and L2VSEL are assigned to two unused BGA positions on the MPC755 360-pin and MPC745 255-pin BGA footprint. Internal pull-ups are provided to default to MPC750-compatible I/O voltages if unconnected.
C.10 MPC755 System Interface Operation (Chapter 8)
This section describes the MPC755 embedded processor bus interface and how its operation differs from the MPC750. It shows how the signals, defined in Chapter 7, "Signal Descriptions," interact to perform address and data transfers and describes how the 32-bit bus mode is implemented on the MPC755.
C.10.1 MPC755 System Interface Overview
The system interface prioritizes requests for bus operations from the instruction and data caches, and performs bus operations in accordance with the 60x bus protocol. It includes address register queues, prioritization logic, and a bus control unit. The system interface
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-51
MPC755 System Interface Operation (Chapter 8)
latches snoop addresses for snooping in the data cache and in the address register queues, and for reservations controlled by the Load Word and Reserve Indexed (lwarx) and Store Word Conditional Indexed (stwcx.) instructions, and maintains the touch load address for the cache. The interface allows one level of pipelining; that is, with certain restrictions described later, there can be two outstanding transactions at any given time. Accesses are prioritized with load operations preceding store operations. Instructions are automatically fetched from the memory system into the instruction unit where they are dispatched to the execution units at a peak rate of two instructions per clock. Conversely, load and store instructions explicitly specify the movement of operands to and from the integer and floating-point register files and the memory system. When the MPC755 encounters an instruction or data access, it calculates the logical address and uses the low-order address bits to check for a hit in the on-chip, 32-Kbyte instruction and data caches. During cache lookup, the instruction and data memory management units (MMUs) use the higher-order address bits to calculate the virtual address, from which they calculate the physical address. The physical address bits are then compared with the corresponding cache tag bits to determine if a cache hit occurred in the L1 instruction or data cache. If the access misses in the corresponding cache, the physical address is used to access the L2 cache tags (if the L2 cache is enabled). If no match is found in the L2 cache tags, the physical address is used to access system memory. In addition to the loads, stores, and instruction fetches, the MPC755 performs hardware table search operations following TLB misses; L2 cache cast-out operations when least-recently used cache lines are written to memory after a cache miss; and cache-line snoop push-out operations when a modified cache line experiences a snoop hit from another bus master. Figure C-1 shows the address path from the execution units and instruction fetcher through the translation logic to the caches and system interface logic. The MPC755 uses separate address and data buses and a variety of control and status signals for performing reads and writes. The address bus is 32 bits and the data bus is 32 or 64 bits. The interface is synchronous--all MPC755 inputs are sampled, and all outputs are driven from the rising edge of the bus clock. The processor runs at a multiple of the system bus-clock speed. The MPC755 core operates at 1.9-2.1 volts, and the I/O signals operate at 1.8 or 3.3 volts.
C.10.2 Address Bus Pipelining
The MPC750 and MPC755 function identically in that the address bus pipelines an instruction transaction before previous data tenures complete for a data transaction. Conversely, the processor performs address bus pipelining for a data transaction following an instruction transaction. However, address bus pipelining does not occur for two consecutive instruction or two consecutive data transactions. Note that this behavior is not documented in Chapter 8, "System Interface Operation."
C-52 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC755 System Interface Operation (Chapter 8)
C.10.3 Bus Clocking
Like the MPC750, the MPC755 requires a single system clock input (SYSCLK) used by the (PLL) circuit to generate a master clock for all of the CPU circuitry (including the bus interface circuitry) which is frequency- and phase-locked to the SYSCLK input. The master clock may be set to integer or half-clock multiples of the SYSCLK frequency. Refer to the MPC755 Hardware Specification for the ratios supported.
C.10.4 32-Bit Data Bus Mode
The MPC755 supports an optional 32-bit data bus mode in which the data bus high and corresponding parity signals (DH[0:31] and DP[0:3]) are used, and the data bus low and corresponding parity signals (DL[0:31]) and (DP[4:7]) are ignored. The following list summarizes the functionality of the 32-bit data bus mode on the MPC755: * * * * * Data tenures of 1, 2, and 8 beats supported (1 to 4 bytes per beat). The address and transfer attribute information is unchanged from 64-bit mode. The TBST and TSIZ[0:2] signals must be reinterpreted for burst size. Data termination is the same for each data beat using TA, DRTRY, and TEA. 32-bit mode configured at power-on (hard reset) through the TLBISYNC signal.
The 32-bit data bus mode operates the same as the 64-bit data bus mode with the exception of the byte lanes involved in the transfer and the number of data beats that are performed. Only byte lanes 0 through 3 are used, corresponding to the data bus signals DH[0:31] and DP[0:3]. Byte lanes 4 through 7 (corresponding to DL[0:31] and DP[4:7]) are never used in this mode. The unused data bus signals are not sampled by the processor during read operations, and they are driven low during write operations. The number of data beats required for a data tenure in 32-bit data bus mode are one, two, or eight depending on the size of the transaction and the cache mode for the address. Data transactions of one or two data beats are performed for cache-inhibited load/store or write-through store operations. These transactions do not assert the TBST signal even though a two-beat burst may be performed (that is, the same TBST and TSIZ[0:2] encoding as in 64-bit data bus mode). Single-beat data transactions are performed for operations of size 4 bytes or less, and double-beat data transactions are performed for 8-byte operations only. (The processor only generates an 8-byte operation for a double-word aligned load or store-double operation to or from the floating-point registers.) Data transactions of eight data beats are performed for burst operations that load into or cast out from the MPC755 internal caches. These transactions transfer 32 bytes similarly to 64-bit mode, and they assert the TBST signal and indicate a transfer size of two (TSIZ[0:2] = 010) similar to 64-bit data bus mode. Otherwise, the same bus protocols apply for arbitration, transfer, and termination of the address and data tenures in 32-bit data bus mode as apply in 64-bit data bus mode. Late
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-53
MPC755 System Interface Operation (Chapter 8)
ARTRY cancellation of the data tenure applies on the bus clock cycle after the first data beat is acknowledged (after the first TA) for word or smaller transactions, or on the bus clock cycle after the second data beat is acknowledged (after the second TA) for double-word or burst operations (or coincident with the respective TA if no-DRTRY mode is selected).
C.10.4.1 Burst Ordering
For burst operations in 32-bit mode, a data block of 32-bytes (one cache line) is transferred in the same order as in 64-bit data bus mode with the exception that eight data beats are required to perform the transfer instead of four. For each double word of the block that is transferred, the upper word of the double word is transferred first on the data bus (on DH[0:31]), and then the lower word of the double word is transferred. Table C-23 shows the burst order for each starting address.
Table C-23. Burst Ordering
For Double Word Starting Address: Data Transfer A[27:28] = 00 1st Data Beat 2nd Data Beat 3rd Data Beat 4th Data Beat 5th Data Beat 6th Data Beat 7th Data Beat 8th Data Beat DW0 - u DW0 - l DW1 - u DW1 - l DW2 - u DW2 - l DW3 - u DW3 - l A[27:28] = 01 DW1 - u DW1 - l DW2 - u DW2 - l DW3 - u DW3 - l DW0 - u DW0 - l A[27:28] = 10 DW2 - u DW2 - l DW3 - u DW3 - l DW0 - u DW0 - l DW1 - u DW1 - l A[27:28] = 11 DW3 - u DW3 - l DW0 - u DW0 - l DW1 - u DW1 - l DW2 - u DW2 - l
Notes: A[27:28] specifies the first double word of the 32-byte block being transferred; the remaining double words to transfer must wrap around the block. A[29:31] are always 0b000 for burst transfers initiated by the MPC755. "DWx" represents the double word that would be addressed by A[27:28] = "x" if a non-burst transfer were performed. "u" and "l" represent the upper word and lower word of the double word, respectively. Each data beat is terminated with one valid assertion of TA (without DRTRY cancellation).
C.10.4.2 Aligned Transfers
The aligned data transfer cases for 32-bit data bus mode are shown in Table C-24. All of the transfers require a single data beat (if cache-inhibited or write-through) except for double-word cases that require two data beats. The double-word case is only generated by the processor for load or store-double operations to/from the floating-point registers. All cache-inhibited instruction fetches are performed as word operations.
C-54
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 System Interface Operation (Chapter 8)
Table C-24. Aligned Data Transfers--32-Bit Data Bus Mode
Data Bus Byte Lanes Program Transfer Size Bus TSIZ[0:2] Bus A[29:31] DH0... B0 Byte--1 Beat Byte--1 Beat Byte--1 Beat Byte--1 Beat Byte--1 Beat Byte--1 Beat Byte--1 Beat Byte--1 Beat Half-Word--1 Beat Half-Word--1 Beat Half-Word--1 Beat Half-Word--1 Beat Word--1 Beat Word--1 Beat Double Word--1st Beat Double Word--2nd Beat 001 001 001 001 001 001 001 001 010 010 010 010 100 100 000 000 000 001 010 011 100 101 110 111 000 010 100 110 000 100 000 A -- -- -- A -- -- -- A -- A -- A A A A B1 -- A -- -- -- A -- -- A -- A -- A A A A B2 -- -- A -- -- -- A -- -- A -- A A A A A ...DH31 DL0... B3 -- -- -- A -- -- -- A -- A -- A A A A A B4 x x x x x x x x x x x x x x x x B5 x x x x x x x x x x x x x x x x B6 x x x x x x x x x x x x x x x x ...DL31 B7 x x x x x x x x x x x x x x x x
Notes: "A": Byte lanes that are read or written during that bus transaction. "--": These lanes are ignored during read transactions and driven with undefined data during write transactions. "x": Byte lanes are not used in 32-bit data bus mode. They are not sampled by the MPC755 during reads and are driven low during writes.
C.10.4.3 Misaligned Data Transfers
Misaligned data transfer cases operate similarly in 32-bit data bus mode as in 64-bit data bus mode with the usual exception that only the DH[0:31] data bus is used. An example of a four-byte misaligned transfer starting at each possible byte address within a double word is shown in Table C-25.
Table C-25. Misaligned Data Transfers Example--32-Bit Data Bus Mode
Data Bus Byte Lanes Program Size of Word (4 Bytes) Bus TSIZ[0:2] Bus A[29:31] DH0... B0 Aligned Misaligned--1st Access 100 011 000 001 A -- B1 A A B2 A A ...DH31 B3 A A DL0... B4 x x B5 x x B6 x x ...DL31 B7 x x
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-55
MPC755 System Interface Operation (Chapter 8)
Table C-25. Misaligned Data Transfers Example--32-Bit Data Bus Mode (continued)
Data Bus Byte Lanes Program Size of Word (4 Bytes) Bus TSIZ[0:2] Bus A[29:31] DH0... B0 2nd Access Misaligned--1st Access 2nd Access Misaligned--1st Access 2nd Access Aligned Misaligned--1st Access 2nd Access Misaligned--1st Access 2nd Access Misaligned--1st Access 2nd Access 001 010 010 001 011 100 011 001 010 010 001 011 100 010 100 011 100 100 101 000 110 000 111 000 A -- A -- A A -- A -- A -- A B1 -- -- A -- A A A -- -- A -- A B2 -- A -- -- A A A -- A -- -- A ...DH31 B3 -- A -- A -- A A -- A -- A -- DL0... B4 x x x x x x x x x x x x B5 x x x x x x x x x x x x B6 x x x x x x x x x x x x ...DL31 B7 x x x x x x x x x x x x
Notes: "A": Byte lane read in "x": Ignored byte lane (does not need to be valid)
C.10.4.4 Selecting D32 Mode
The processor selects 64- or 32-bit data bus mode at power-up by sampling the state of the TLBISYNC signal at the negation of HRESET (coming out of hard reset). If the TLBISYNC signal is high (negated) at the negation of HRESET, 64-bit data mode is selected. If TLBISYNC is low (asserted), 32-bit data mode is used. For 32-bit systems not using the TLBISYNC signal, TLBISYNC can be connected to HRESET directly. Otherwise, it can be connected to a pull-up resistor to select 64-bit mode. For systems using the TLBISYNC input function, the state of HRESET must be logically combined in the TLBISYNC generation path to select the desired mode.
C.10.4.5 Signal Relationships
The signal relationships for 32-bit mode are the same as 64-bit mode. Figure C-14 and Figure C-15 show an example of an 8-beat burst transaction and a 2-beat burst transaction with DRTRY, respectively.
C-56
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 System Interface Operation (Chapter 8)
1 SYSCLK TS ABB ADDR TBST AACK ARTRY DBB DH[0:31] TA DRTRY TEA 2 3 4 5 6 7 8 9 10 11 12
0
1
2
3
4
5
6
7
Figure C-14. 32-Bit Data Bus Mode--8-Beat Burst (No Retry Conditions)
1 SYSCLK TS ABB ADDR TBST AACK ARTRY DBB DH[0:31] TA DRTRY TEA 2 3 4 5 6 7 8 9 10 11 12
0
1
Figure C-15. 32-Bit Data Bus Mode--2-Beat Burst (with DRTRY)
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-57
MPC755 L2 Cache Interface Operation (Chapter 9)
C.11 MPC755 L2 Cache Interface Operation (Chapter 9)
This section describes the L2 cache interface operation of the MPC755, and how it differs from the MPC750.
C.11.1 MPC755 L2 Cache Interface Overview
The MPC755 L2 cache is implemented with an on-chip, two-way set-associative tag memory, and with external synchronous SRAMs for data storage, similar to the MPC750. The external SRAMs are accessed through a dedicated L2 cache port which supports a single bank of up to 1 Mbyte of synchronous SRAMs. The L2 cache normally operates in copyback mode and supports system cache coherency through snooping. The differences from the MPC750 L2 cache interface are summarized as follows: * * * * * * Support for 4-1-1-1 PB3 synchronous burst-only SRAMs Additional control of the L2 interface during low-power operation Additional information about (and control of) the L2 DLL circuitry A new instruction-only mode Private memory capability for half or all of the L2 SRAM More flexible control of the L2 parity signals by allowing data or data and address parity
In addition to including the MPC755-specific information, this section supersedes Chapter 9, "L2 Cache Interface Operation." Figure C-16 shows a typical connection from the MPC755 processor L2 interface to a bank of PB3 SRAMs. See Chapter 9, "L2 Cache Interface Operation," for typical connections to other SRAM technologies. Note that the signals for the L2 interface on the MPC755 are the same as those used for the MPC750.
C-58
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
L2ADDR[16:0] L2DATA[0:63] L2DP[0:7] L2CE L2WE L2ZZ 1 L2CLK_OUTA (optional)
Addr[16:0] Data[0:31] Parity[0:3] E W ADS ADSP PB3 SRAM 128K x 36 K
MPC755
Addr[0:16] Data[0:31] Parity[0:3] L2CE L2WE L2ZZ 1 (optional) E W ADS ADSP PB3 SRAM K 128K x 36
L2SYNC_OUT L2SYNC_IN
L2CLK_OUTB
Notes: 1. For a 1-Mbyte L2, use address bits 0-16 (bit 0 is LSB). 2. For a 512-Kbyte L2, use address bits 0-15 (bit 0 is LSB). 3. For a 256-Kbyte L2, use address bits 0-14 (bit 0 is LSB). 4. External clock routing should ensure that the rising edge of the L2 clock is coincident at the K input of all SRAMs and at the L2SYNC_IN input of the MPC755. The clock `A' network only could be used, or the clock `B' network could also be used depending on loading, frequency, and number of SRAMs. 5. No pull-up resistors are normally required for the L2 interface. 6. The MPC755 supports only one bank of SRAMs. 7. For high-speed operation, no more than two loads should be presented on each L2 interface signal.
Figure C-16. Typical Synchronous 1-Mbyte L2 Cache System Using PB3 SRAM
C.11.1.1 L2 Cache Organization
The MPC750 L2 cache interface is implemented with an on-chip, two-way set-associative tag memory with 4096 tags per way, and a dedicated interface with support for up to 1 Mbyte of external synchronous SRAM for data storage. The tags are sectored to support either two cache blocks per tag entry (two sectors, 64 bytes), or four cache blocks per tag entry (four sectors, 128 bytes) depending on the L2 cache size. If the L2 cache is configured for 256 Kbytes or 512 Kbytes of external SRAM, the tags are configured for two sectors per L2 cache block. The L2 tags are configured for four sectors per L2 cache block when 1 Mbyte of external SRAM is used. Each sector (32-byte L1 cache block) in the L2 cache has its own valid and modified bits and other status bits that implement the MEI cache coherency protocol.
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-59
MPC755 L2 Cache Interface Operation (Chapter 9)
Table C-26 lists the data RAM organizations for the various L2 cache sizes. Table C-26 also indicates typical SRAM sizes that might be used to construct such a cache.
Table C-26. L2 Cache Sizes and Data RAM Organizations
L2 Cache Size 256 Kbytes 512 Kbytes 1 Mbyte L2 Data Bus Size 64/72 bit 64/72 bit 64/72 bit L2 Data RAM Organization 32 Kbytes x 64/72 64 Kbytes x 64/72 128 Kbytes x 64/72 Example SRAM Sizes (2) 32 Kbytes x 32/36 (2) 64 Kbytes x 32/36 (2) 128 Kbytes x 32/36
Notes: The MPC755 supports only one bank of SRAMs. For very high speed operation, no more than two SRAM devices should be used.
C.11.1.2 L2 Cache Control
The L2 cache control register (L2CR) allows control of L2 cache configuration and timing, byte-level data parity generation and checking, global invalidation of L2 contents, write-through operation, and L2 test support. The L2 cache interface provides two clock outputs that allow the clock inputs of the SRAMs to be driven at select frequency divisions of the processor core frequency. See the MPC755 Hardware Specifications for details about the specific frequency ratios supported. For more details about the L2CR, see Section C.11.4.1, "L2 Cache Control Register (L2CR)."
C.11.1.3 L2 Private Memory
A portion, or all, of the L2 cache can alternately be used as a private SRAM. In this way, a portion of the physical address space can be mapped into a portion of the L2 SRAM. This functionality is described in Section C.11.2.2, "L2 Private Memory Operation." When private SRAM is used and the upper bits of the physical address match the L2PM[PMBA] field, the data is written or read from the private space of the L2 SRAM instead of external memory. Note that all of the SRAM can be designated as private, or for 512 Kbytes or 1 Mbyte SRAM, half can be designated as private and half as L2 cache. See Table C-29 for all the supported combinations. Also, see Section C.11.6.5, "Cache Control Instructions and Effect on Private Memory Operation," for information on the operation of cache control instructions with respect to private memory space.
C.11.2 L2 Interface Operation
This section describes the general operation of both the L2 cache and the private memory capabilities of the L2 interface.
C-60
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
C.11.2.1 L2 Cache Operation
The MPC755 L2 cache is a combined instruction and data cache that receives memory requests from both L1 instruction and data caches independently. The L1 requests are generally the result of instruction fetch misses, data load or store misses, write-through operations, or cache management instructions. Each L1 request generates an address lookup in the L2 tags. If a hit occurs, the instructions or data are forwarded to the L1 cache. A miss in the L2 tags causes the L1 request to be forwarded to the 60x bus interface. The cache block received from the bus is forwarded to the L1 cache immediately, and is also loaded into the L2 cache with the tag marked valid and unmodified. If the cache block loaded into the L2 causes a new tag entry to be allocated and the current tag entry is marked valid modified, the modified sectors of the tag to be replaced are cast out from the L2 cache to the 60x bus. See Section C.11.6.4, "Other Cache Control Instructions and Effect on L2 Cache," for more information on the operation of cache control operations on the L2 cache. C.11.2.1.1 L2 Cache Access Priorities At any given time the L1 instruction cache may have one instruction fetch request, and the L1 data cache may have one load and two stores requesting L2 cache access. The L2 cache also services snoop requests from the 60x bus. When there are multiple pending requests to the L2 cache, snoop requests have highest priority, followed by data load and store requests (serviced on a first-in, first-out basis). Instruction fetch requests have the lowest priority in accessing the L2 cache when there are multiple accesses pending. If read requests from both the L1 instruction and data caches are pending, the L2 cache can perform a hit-under-miss operations and supplies the available instruction or data while a bus transaction for the previous L2 cache miss is performed. The L2 cache does not support miss-under-miss, and the second instruction fetch or data load stalls until the bus operation resulting from the first L2 miss completes. C.11.2.1.2 L2 Cache Services All requests to the L2 cache that are marked cacheable (even if the respective L1 cache is disabled or locked) cause a tag lookup and will be serviced if the instructions or data are in the L2 cache. Burst requests from the L1 caches and single-beat read requests that hit in the L2 cache are forwarded the instructions or data, and the L2 LRU bit for that tag is updated. Burst writes from the L1 data cache due to a castout or replacement copyback are written only to the L2 cache, and the L2 cache sector is marked modified. Designers should note that during burst transfers into and out of the L2 cache SRAM array, an address is generated by the MPC755 for each data beat. If the L2 cache is configured as write-through, the L2 sector is marked unmodified, and the write is forwarded to the 60x bus. If the L1 castout requires a new L2 tag entry to be
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-61
MPC755 L2 Cache Interface Operation (Chapter 9)
allocated and the current tag is marked modified, any modified sectors of the tag to be replaced are cast out of the L2 cache to the 60x bus. Single-beat reads that miss in the L2 cache do not cause any state changes in the L2 cache and are forwarded on the 60x bus interface. Cacheable single-beat store requests marked copyback that hit in the L2 are allowed to update the L2 cache sector, but do not cause L2 cache sector allocation or deallocation. Cacheable, single-beat store requests that miss in the L2 are forwarded to the 60x bus. Single-beat store requests marked write-through (through address translation or through the configuration of L2CR[L2WT]) are written to the L2 cache if they hit and are written to the 60x bus independent of the L2 hit/miss status. If the store hits in the L2 cache, the modified/unmodified status of the tag remains unchanged. C.11.2.1.3 L2 Cache Coherency and WIMG Bits Different from the MPC750, a request to the L2 cache on the MPC755 that is marked cache-inhibited by address translation (through either the MMU or by default WIMG configuration) will hit in the L2 cache if it has been previously loaded (and is still valid), causing a paradox condition. However, misses for cache-inhibited accesses do not cause a new entry to be allocated and do not cause any L2 cache tag state change. C.11.2.1.4 Single-Beat Accesses to L2 Interface The processor performs single-beat read and write accesses when the L1 instruction and/or data caches are disabled, and when the WIMG bit settings indicate that an area of memory is cache-inhibited (this case not forwarded to the L2 interface). Additionally, single-beat writes occur to the L2 interface when that area of memory is designated as write-through. PB2 SRAMs naturally support single-beat read and write accesses. However, the L2 interface requires 64-bit accesses to the SRAM. Therefore, for single-beat writes, the MPC750 and MPC755 automatically perform a read-modify-write operation in order to write the complete 64-bits to the L2. PB3 SRAMs support bursting accesses only. Thus, for PB3 SRAMs, the L2 interface always automatically performs a burst read for a complete cache line from the SRAM. If a single-beat read was requested, then the appropriate double word is forwarded to the L1. Write accesses to PB3 SRAMs also require burst accesses. Thus for a single-beat write, the L2 interface automatically performs a burst read-modify-write in order to perform the complete write burst.
C.11.2.2 L2 Private Memory Operation
The L2 interface of the MPC755 can also be used as a low-latency, high-bandwidth private memory space. The private memory space is not snooped and is therefore not coherent with other processors in a system. The private space can contain instructions and data and its
C-62
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
contents can be cached in the L1 instruction and data caches provided the accesses are marked cacheable. The private memory receives requests from both the L1 instruction cache and the L1 data cache independently. The L1 requests are generally the result of instruction misses, data load or store misses, L1 data cache castouts, write-through operations, or cache management instructions. For all cacheable accesses, the L1 requests are looked-up in the L2 tags and compared with the corresponding PMBA bits of the L2PM. If a match occurs with L2PM[PMBA], the result of the L2 tag lookup is ignored and the request is forwarded to the external L2 SRAM interface as a private memory access. All transactions that read or write data, except those caused by the eciwx and ecowx instructions, are allowed to hit in the private memory space, regardless of the WIMG memory/cache attribute bits. Transactions caused by the icbi, sync, tlbie, tlbsync, eieio, eciwx, and ecowx instructions never hit in the private memory space and are forwarded to the system interface. Accesses caused by the dcbi instruction that hit in the private memory space are discarded (after invalidating the L1 data cache). The private memory space does not have coherency state information. When the L1 data cache is reloaded for a cacheable load or store, the state will be exclusive or modified, respectively. Generally, the private memory operates according to the following: * * Arbitration is shared with the L2 cache and thus uses the same priorities. Burst read requests from the L1 instruction or data caches that map to the private memory space are forwarded data from the L2 SRAMs designated as private memory. Cache-inhibited stores write the appropriate data to the L2 interface. Requests to the L2 interface that are marked cacheable by address translation (even if the respective L1 cache is locked) are serviced by the L2 interface if they map to the private memory space. Burst read and single-beat read requests from the L1 instruction or data caches that map to the private memory space are forwarded data from the L2 SRAMs designated as private memory. Burst read requests from the L1 instruction or data caches that do not map to private memory space (and miss in the L2 cache, if enabled) initiate a burst read operation from the system interface for the cache line that missed. The cache line received from the bus is forwarded to the appropriate L1 cache (and the L2 cache, if enabled). Normal burst writes from the L1 data cache due to castouts (also referred to as replacement copybacks) that map to the private memory space are written to the external SRAMs designated as private memory regardless of the L2CR[L2IO] setting. Burst writes that don't map to the private memory space are allocated in the L2 cache (if enabled).
*
*
*
*
Note that software-generated single-beat reads and writes directed to the private memory SRAMs are handled in the same way as described for the SRAMs as L2 cache, and
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-63
MPC755 L2 Cache Interface Operation (Chapter 9)
read-modify-write transactions are performed automatically by the L2 controller as needed as described in Section C.11.2.1.4, "Single-Beat Accesses to L2 Interface." See Section C.11.6.4, "Other Cache Control Instructions and Effect on L2 Cache," for more information on the operation of cache control operations on the L2 cache. However, the following apply to the private memory space: * Cacheable stwcx. operations are handled by the L1 data cache similarly to normal cacheable stores. The L2 interface does not treat stwcx. differently than a normal cacheable store. Cache-inhibited stwcx. accesses that hit in the private memory space write the appropriate data to the L2 interface and are not forwarded to the system interface. dcbz operations that hit in the private memory space do not affect the data in the external SRAMs. They are handled entirely by the L1. dcbf operations are issued to the L2 interface after being processed by the L1 data cache. If a dcbf that hits in L1 data cache and requires a line push hits in the private memory space, the cache line is written to the L2 interface. dcbf operations that hit in the private memory space are never forwarded to the system interface. dcbst instructions are issued to the L2 cache after being processed by the L1 data cache. If a dcbst that hits in the L1 data cache and requires a line push hits in the private memory space, the cache line is written to the external SRAMs. dcbst operations that hit in the private memory space are never forwarded to the system interface. dcbi instructions that hit in the private memory space are discarded and are never forwarded to the system interface. icbi instructions never affect the L2 interface and are just passed to the system interface for further processing. sync, eieio, eciwx, ecowx, tlbie, and tlbsync instructions pass though the L2 interface and are forwarded to the system interface for further processing.
* *
*
* * *
Note that L2 cache-related performance monitor events may not produce expected results when L2 private memory is enabled. Specifically, hits to the private memory are treated as L2 cache misses by the performance monitor. No new performance monitor events have been added to specifically support the L2 private memory.
C.11.3 L2 Clocking
The MPC755 generates the clock for the external L2 synchronous data RAMs in the same way as the MPC750. The clock frequency for the RAMs is divided down from the core clock frequency of the MPC755. The divided-down clock is then phase-adjusted by an on-chip delay-lock loop (DLL) circuit, sent out from the MPC755 to the external RAMs, and then returned as an input to the DLL so that the rising-edge of the clock as seen at the
C-64
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
external RAMs can be aligned to the clocking of the internal latches in the MPC755 L2 bus interface. The core-to-L2 frequency divisor for the L2 PLL is selected through the L2CLK bits of the L2CR register. Generally, the divisor must be chosen according to the frequency supported by the external RAMs, the internal core operating frequency, and the phase adjustment range that the L2 DLL supports. The L2 RAM frequency can be divided down from the core operating frequency as described in the MPC755 Hardware Specification. Additional supported frequency ratios for the MPC755 are also highlighted in the hardware specification.
C.11.4 L2 Registers
This section describes the cache configuration bits in the L2 cache control register (L2CR) and the L2 cache private memory control register (L2PM).
C.11.4.1 L2 Cache Control Register (L2CR)
The L2 cache control register of the MPC755 is a read/write, supervisor-level, implementation-specific SPR used to configure and operate the L2 cache, and it is slightly different from the L2CR of the MPC750. The differences are summarized as follows: * * * * New encoding for L2RAM field defined for PB3 SRAM support More output hold options defined for L2OH field New L2CR bit for instruction-only mode--L2IO New L2CR fields defined for low-power operation and DLL control--L2CS, L2DRO, and L2CTR
The L2CR register can be accessed with the mtspr and mfspr instructions using SPR 1017 (decimal). Note that all bits of L2CR are cleared by a hard reset and on power-on reset. Figure C-17 shows the bits of the L2CR.
L2TS L2WT L2CTL L2I L2DO
L2RAM 6 7 8 L2OH 0
L2SL L2DF L2BYP L2IO L2CS L2DRO
0 L2CTR
Reserved
L2E L2PE
L2SIZ 0 1 2 3 4
L2IP
L2CLK
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
30 31
Figure C-17. L2 Cache Control Register (L2CR)
The L2CR bits for the MPC755 are described in Table C-27.
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-65
MPC755 L2 Cache Interface Operation (Chapter 9)
Table C-27. L2 Cache Control Register
Bits 0 Name L2E Description L2 enable. Enables L2 cache operation (including snooping) starting with the next transaction the L2 cache unit receives. Before enabling the L2 cache, the L2 clock must be configured through L2CR[2CLK], and the L2 DLL must stabilize (see the MPC755 Hardware Specifications) and all other L2CR bits must be set appropriately. The L2 cache may need to be globally invalidated. L2 data parity generation and checking enable. Enables parity generation and checking for the L2 data RAM interface. When disabled, generated parity is always zeros. Note that the L2 interface always generates and drives parity on the L2DP[0:7] signals for writes to the SRAM array. L2 size. Should be set according to the size of the L2 data RAMs used. A 256-Kbyte L2 cache requires a data RAM configuration of 32 Kbytes x 64 bits; a 512-Kbyte L2 cache requires a configuration of 64 Kbyte x 64 bits; a 1-Mbyte L2 cache requires a configuration of 128 Kbytes x 64 bits. 00 Reserved 01 256 Kbyte 10 512 Kbyte 11 1 Mbyte
1
L2PE
2-3
L2SIZ
4-6
L2CLK L2 clock ratio (core-to-L2 frequency divider). Specifies the clock divider ratio between the core clock frequency and the L2 data RAM interface. When these bits are cleared, the L2 clock is stopped and the on-chip DLL for the L2 interface is disabled. For non-zero values, the processor generates the L2 clock and the on-chip DLL is enabled. After the L2 clock ratio is chosen, the DLL must stabilize before the L2 interface can be enabled (see the MPC755 Hardware Specifications). The resulting L2 clock frequency cannot be slower than the clock frequency of the 60x bus interface. 000 L2 clock and DLL disabled 001 /1 010 /1.5 011 Reserved 100 /2 101 /2.5 110 /3 111 Reserved L2RAM L2 RAM type--Configures the L2 interface for the type of synchronous SRAMs used: * Flow-through (register-buffer) synchronous burst SRAMs that clock addresses in and flow data out * Pipelined (register-register) PB2 synchronous burst SRAMs that clock addresses in and clock data out (with 3-1-1-1 access times) * Pipelined (register-register) PB3 synchronous burst SRAMs (with 4-1-1-1 access times) * Late-write synchronous SRAMs, for which the MPC755 requires a pipelined (register-register) configuration. Late-write RAMs require write data to be valid on the cycle after WE is asserted rather than on the same cycle as the write enable (as required with traditional burst RAMs). For the PB2 burst RAM selection, the MPC755 does not burst data into the L2 cache; it generates an address for each access. However, for the PB3 burst RAM selection, the MPC755 does burst data into the L2 cache. If the SRAMs or part of the SRAM is configured as an L2 cache, the L1 caches should be enabled for data to be efficiently loaded into the L2 cache for all types of SRAMs; otherwise, significant latencies are incurred. If all the L2 SRAM cache is configured as private memory, disabled L1 instruction and data caches do not affect the L2 latencies. Pipelined SRAMs may be used for all L2 clock modes. Note that flow-through SRAMs can be used only for L2 clock modes that are divide-by-2 or slower (divide-by-1 and divide-by-1.5 not allowed). 00 Flow-through (register-buffer) synchronous burst SRAM 01 Pipelined (register-register) PB3 synchronous burst SRAM 10 Pipelined (register-register) PB2 synchronous burst SRAM 11 Pipelined (register-register) synchronous late-write SRAM
7-8
C-66
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
Table C-27. L2 Cache Control Register (continued)
Bits 9 Name L2DO Description L2 data-only. Setting this bit enables data-only operation in the L2 cache. For this operation, instruction transactions from the L1 instruction cache already cached in the L2 cache can hit in the L2, but new instruction transactions from the L1 instruction cache are treated as cache-inhibited (bypass L2 cache, no L2 checking done). When both L2DO and L2IO are set, the L2 cache is effectively locked (cache misses do not cause new entries to be allocated but write hits use the L2). L2 global invalidate. Setting L2I invalidates the L2 cache globally by clearing the L2 bits including status bits. This bit must not be set while the L2 cache is enabled. L2 RAM control (ZZ enable). Setting L2CTL enables the automatic operation of the L2ZZ (low-power mode) signal for cache RAMs that support the ZZ function (PB2 RAMs). If L2CTL is set, L2ZZ asserts automatically when the MPC755 enters nap or sleep mode and negates automatically when the MPC755 exits nap or sleep mode. The use of this bit is not recommended for future compatibility. This bit should not be set when the MPC755 is in nap mode and snooping is to be performed through the negation of QACK. Additionally, it should not be set when using PB3 SRAMs. L2 write-through. Setting L2WT selects write-through mode (rather than the default write-back mode) so all writes to the L2 cache also write through to the 60x bus. For these writes, the L2 cache entry is always marked as exclusive rather than modified. This bit must never be set after the L2 cache has been enabled because previously-modified lines could get re-marked as exclusive during normal operation. L2 test support. Setting L2TS causes cache block pushes from the L1 data cache that result from dcbf and dcbst instructions to be written only into the L2 cache and marked valid, rather than being written only to the 60x bus and marked invalid in the L2 cache in case of a hit. This bit allows a dcbz/dcbf instruction sequence to be used with the L1 cache enabled to easily initialize the L2 cache with any address and data information. This bit also keeps dcbz instructions from being broadcast on the 60x bus and single-beat cacheable store misses in the L2 from being written to the 60x bus. L2 output hold. These bits configure output hold time for address, data, and control signals driven by the MPC755 to the L2 data RAMs. They should generally be set according to the SRAM's input hold time requirements, for which late-write SRAMs usually differ from flow-through or burst SRAMs. See the MPC755 Hardware Specification for the actual recommended values. 00 Least hold time 01 More hold time 10 Even more hold time 11 Most output hold time L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM interface is operated below 110 MHz. L2 differential clock. Setting L2DF configures the two clock-out signals (L2CLK_OUTA and L2CLK_OUTB) of the L2 interface to operate as one differential clock. In this mode, the B clock is driven as the logical complement of the A clock. This mode supports the differential clock requirements of late-write SRAMs. Generally, this bit should be set when late-write SRAMs are used.
10 11
L2I L2CTL
12
L2WT
13
L2TS
14-15
L2OH
16
L2SL
17
L2DF
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-67
MPC755 L2 Cache Interface Operation (Chapter 9)
Table C-27. L2 Cache Control Register (continued)
Bits 18 Name Description
L2BYP L2 DLL bypass. The DLL unit receives three input clocks: * A square-wave clock from the PLL unit to phase adjust and export * A non-square-wave clock for the internal phase reference * A feedback clock (L2SYNC_IN) for the external phase reference. Setting L2BYP causes the non-square wave clock (#2) to be used for both phase adjust and phase reference (#1 and #2), thus bypassing the square wave clock from the PLL. (Note that clock #2 is the actual clock used by the registers of the L2 interface circuitry.) L2BYP is intended for use when the PLL is being bypassed. If the PLL is being bypassed, the DLL must be operated in 1:1 mode and SYSCLK must be fast enough for the DLL to support. -- L2IO Reserved. These bits are implemented but not used; keep at 0 for future compatibility. L2 instruction-only. Setting this bit enables instruction-only operation in the L2 cache. For this operation, data transactions from the L1 data cache already cached in the L2 cache can hit in the L2 (including writes), but new data transactions (transactions that miss in the L2) from the L1 data cache are treated as cache-inhibited (bypass L2 cache, no L2 checking done). When both L2DO and L2IO are set, the L2 cache is effectively locked (cache misses do not cause new entries to be allocated but write hits use the L2). Note that this bit can be programmed dynamically. L2 clock stop. Setting this bit causes the L2 clocks to the SRAMs to automatically stop whenever the MPC755 enters nap or sleep modes, and automatically restart when exiting those modes (including for snooping during nap mode). It operates by asynchronously gating off the L2CLK_OUT[A:B] signals while in nap or sleep mode. The L2 SYNC_OUT/SYNC_IN path remains in operation, keeping the DLL synchronized. This bit is provided as a power-saving alternative to the L2CTL bit and its corresponding ZZ pin, which may not be useful for dynamic stopping/restarting of the L2 interface from nap and sleep modes due to the relatively long recovery time from ZZ negation that many SRAM vendors require.
19-20 21
22
L2CS
23
L2DRO L2 DLL rollover. Setting this bit enables a potential rollover (or actual rollover) condition of the DLL to cause a checkstop for the processor. A potential rollover condition occurs when the DLL is selecting the last tap of the delay line, and thus may risk rolling over to the first tap with one adjustment while in the process of keeping synchronized. Such a condition is improper operation for the DLL, and, while this condition is not expected, it allows detection for added security. This bit should be set when the DLL is first enabled (set with the L2CLK bits) to detect rollover during initial synchronization. It could also be set when the L2 cache is enabled (with L2E bit) after the DLL has achieved its initial lock.
24-30 L2CTR L2 DLL counter (read-only). These bits indicate the current value of the DLL counter (0 to 127). They are asynchronously read when the L2CR is read, and as such, should be read at least twice with the same value in case the value is asynchronously caught in transition. These bits are intended to provide observability of where in the 128-bit delay chain the DLL is at any given time. Generally, the DLL operation should be considered at risk if it is found to be within a couple of taps of its beginning or end point (tap 0 or tap 128). 31 L2IP L2 global invalidate in progress (read only). This read-only bit indicates whether an L2 global invalidate is occurring. It should be monitored after an L2 global invalidate has been initiated by the L2I bit to determine when it has completed.
C.11.4.2 L2 Private Memory Control Register (L2PM)
The L2 private memory control register is a new register in the MPC755 that allows a portion of the physical address space to be mapped into a portion of the L2 SRAM. It is a read/write, supervisor-level, implementation-specific register (SPR) which is accessed with the mtspr and mfspr instructions using SPR 1016 (decimal). Note that all bits of L2PM are cleared by a hard reset or power-on reset. Figure C-18 shows the bits of the L2PM.
C-68 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
Reserved
PMBA 0
0 13 14
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 PMSIZ 29 30 31
Figure C-18. L2 Private Memory Control Register (L2PM)
The L2PM bits are described in Table C-28.
Table C-28. L2PM Bit Settings
Bit 0-13 Name PMBA Description Private memory base address. If the upper bits of the physical address match the PMBA, the data is written or read from the private memory space of the L2 SRAM instead of external memory. 0-11 for 1 Mbyte 0-12 for 512 Kbytes 0-13 for 256 Kbytes Reserved
14-29 30-31
--
PMSIZ Private memory size. These bits along with the L2SIZ bits of the L2CR determine the amount of the L2 cache that is used as private memory space. See Table C-29 for the L2 SRAM configurations. 00 = Private memory disabled 01 = 256 Kbytes 10 = 512 Kbytes 11 = 1 Mbyte
Table C-29 describes the combinations possible (and the required bit settings) for using some or all of the L2 SRAM as private memory.
Table C-29. L2 SRAM Configuration
Total L2 SRAM 256KB Configured Only as L2 Cache L2E = 1 L2SIZ = 01 (256 Kbytes) PMSIZ = 00 (disabled) L2E = 1 L2SIZ = 10 (512 Kbytes) PMSIZ = 00 (disabled) L2E = 1 L2SIZ = 11 (1 Mbyte) PMSIZ = 00 (disabled) Configured as 1/2 L2 Cache and 1/2 Private Memory Configured Only as Private Memory L2E = 0 L2SIZ = don't care PMSIZ = 01 (256 Kbytes)
Not Available
512KB
L2E =1 L2E = 0 L2SIZ = 01 (256 Kbytes) L2SIZ = don't care PMSIZ = 01 (256 Kbytes) PMSIZ = 10 (512 Kbytes) L2E =1 L2E = 0 L2SIZ = 10 (512 Kbytes) L2SIZ = don't care PMSIZ = 10 (512 Kbytes) PMSIZ = 11 (1 Mbyte)
1M
C.11.5 L2 Address and Data Parity Signals
The L2 parity signals (L2DP[0:7]) can be generated and checked by setting the L2PE bit in the L2CR. The parity bits are generated and checked using the corresponding L2DATA signals, and represent odd parity. If the L2AP_EN bit in HID2 is also set, the L2ADDR signals are also included in the parity generation and checking (again, representing odd
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-69
MPC755 L2 Cache Interface Operation (Chapter 9)
parity) on the MPC755. Table C-30 lists the association between L2DP[0:7] signals and the L2DATA and L2ADDR signals.
Table C-30. L2 Data Parity Signal Associations
Signal L2DP0 L2DP1 L2DP2 L2DP3 L2DP4 L2DP5 L2DP6 L2DP7 L2AP_EN = 0 L2PE = 1 L2DATA[0:7] L2DATA[8:15] L2DATA[16:23] L2DATA[24:31] L2DATA[32:39] L2DATA[40:47] L2DATA[48:55] L2DATA[56:63] L2AP_EN = 1 L2PE = 1 L2DATA[0:7], L2ADDR[0:2] L2DATA[8:15], L2ADDR[3:4] L2DATA[16:23], L2ADDR[5:6] L2DATA[24:31], L2ADDR[7:8] L2DATA[32:39], L2ADDR[9:10] L2DATA[40:47], L2ADDR[11:12] L2DATA[48:55], L2ADDR[13:14] L2DATA[56:63], L2ADDR[15:16]
C.11.6 L2 Cache Programming Considerations
This section describes some of the programming considerations for controlling the L2 cache and the effect of other cache control instructions on the L2 cache.
C.11.6.1 Enabling and Disabling the L2 Cache
Following a power-on or hard reset, the L2 cache and the L2 DLL are disabled initially. Before enabling the L2 cache, the L2 DLL must first be configured through the L2CR register, and the DLL must be allowed sufficient time (see the MPC755 Hardware Specifications) to achieve phase lock. Before enabling the L2 cache, other configuration parameters must be set in the L2CR, and the L2 tags must be globally invalidated. The L2 cache should be initialized during system start-up. The sequence for initializing the L2 cache is as follows: * * * Power-on reset (automatically performed by the assertion of HRESET signal). Disable L2 cache by clearing L2CR[L2E]. Set the L2CR[L2CLK] bits to the desired clock divider setting. Setting a non-zero value automatically enables the DLL. All other L2 cache configuration bits should be set to properly configure the L2 cache interface for the SRAM type, size, and interface timing required.
MPC750 RISC Microprocessor Family User's Manual MOTOROLA
C-70
MPC755 L2 Cache Interface Operation (Chapter 9)
*
*
*
Wait for the L2 DLL to achieve phase lock. This can be timed by setting the decrementer for a time period equal to 640 L2 clocks, or by performing an L2 global invalidate. Perform an L2 global invalidate. The global invalidate could be performed before enabling the DLL, or in parallel with waiting for the DLL to stabilize. Refer to Section C.11.6.2, "L2 Cache Global Invalidation," for more information about L2 cache global invalidation. Note that a global invalidate always takes much longer than it takes for the DLL to stabilize. After the DLL stabilizes, an L2 global invalidate has been performed, and the other L2 configuration bits have been set, enable the L2 cache for normal operation by setting the L2CR[L2E] bit to 1.
Note that if the L1 data cache is disabled and the L2 cache is enabled, hits in the L2 work correctly and update the L2. However, no new entries are allocated into the L2 because when the L1 data cache is disabled, the processor only performs single-beat accesses. Thus, these accesses all propagate to the 60x bus interface (the L2 only stores and allocates entries for burst accesses). Before the L2 cache is disabled it must be flushed to prevent coherency problems. Note that the cache management instructions dcbf, dcbst, and dcbi do not affect the L1 data cache or L2 cache when they are disabled.
C.11.6.2 L2 Cache Global Invalidation
The L2 cache supports a global invalidation function in which all bits of the L2 tags (tag data bits, tag status bits, and LRU bit) are cleared. It is performed by an on-chip hardware state machine that sequentially cycles through the L2 tags. The global invalidation function is controlled through L2CR[L2I], and it must be performed only while the L2 cache is disabled. The MPC755 can continue operation during a global invalidation provided the L2 cache has been properly disabled before the global invalidation operation starts. Note that the MPC755 must be operating at full power (low power modes disabled) in order to perform L2 cache invalidation. The sequence for performing a global invalidation of the L2 cache is as follows: * * Clear HID0[DPM] bit to zero. Dynamic power management must be disabled. Execute a sync instruction to finish any pending store operations in the load/store unit, disable the L2 cache by clearing L2CR[L2E], and execute an additional sync instruction after disabling the L2 cache to ensure that any pending operations in the L2 cache unit have completed. Initiate the global invalidation operation by setting the L2CR[L2I] bit to 1. Monitor the L2CR[L2IP] bit to determine when the global invalidation operation is completed (indicated by the clearing of L2CR[L2IP]). The global invalidation requires approximately 32K core clock cycles to complete.
Appendix C. MPC755 Embedded G3 Microprocessor C-71
* *
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
*
After detecting the clearing of L2CR[L2IP], clear L2CR[L2I] and re-enable the L2 cache for normal operation by setting L2CR[L2E]. Also, dynamic power management can be enabled at this time.
C.11.6.3 L2 Cache Flushing
L1 cache-block-push operations generated by the execution of dcbf and dcbst instructions write through to the 60x bus interface and invalidate the L2 cache sector if they hit. The execution of dcbf and dcbst instructions that do not cause a cache-block-push from the L1 cache are forwarded to the L2 cache to perform a sector invalidation and/or push from the L2 cache to the 60x bus as required. If the dcbf and dcbst instructions do not cause a sector push from the L2 cache, they are forwarded to the 60x bus interface for address-only broadcast if HID0[ABE] is set to 1.
C.11.6.4 Other Cache Control Instructions and Effect on L2 Cache
The execution of the stwcx. instruction results in single-beat writes from the L1 data cache. These single-beat writes are processed by the L2 cache according to hit/miss status, L1 and L2 write-through configuration, and reservation-active status. If the address associated with the stwcx. instruction misses in the L2 cache or if the reservation is no longer active, the stwcx. instruction bypasses the L2 cache and is forwarded to the 60x bus interface. If the stwcx. hits in the L2 cache and the reservation is still active, one of the following actions occurs: * If the stwcx. hits a modified sector in the L2 cache (independent of write-through status), or if the stwcx. hits both the L1 and L2 caches in copy-back mode, the stwcx. is written to the L2 and the reservation completes. If the stwcx. hits an unmodified sector in the L2 cache, and either the L1 or L2 is in write-through mode, the stwcx. is forwarded to the 60x bus interface and the sector hit in the L2 cache is invalidated.
*
The dcbi instruction is always forwarded to the L2 cache and causes a segment invalidation if a hit occurs. The dcbi instruction is also forwarded to the 60x bus interface for broadcast if HID0[ABE] is set to 1. The icbi instruction invalidates only L1 cache blocks and is never forwarded to the L2 cache. Any dcbz instructions marked global do not affect the L2 cache state. If a dcbz instruction hits in the L1 and L2 caches, the L1 data cache block is cleared and the dcbz instruction completes. If a dcbz instruction misses in the L2 cache, it is forwarded to the 60x bus interface for broadcast. Any dcbz instructions that are marked nonglobal act only on the L1 data cache. Note that the dcbz instruction on the MPC755 must be preceded by a dcbf instruction to that address. The sync and eieio instructions bypass the L2 cache and are forwarded to the 60x bus.
C-72
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
C.11.6.5 Cache Control Instructions and Effect on Private Memory Operation
When private memory is used as all or part of the L2 interface, cache control instructions function as follows: * Cacheable stwcx. operations are handled by the L1 data cache similarly to normal cacheable stores. The L2 interface does not treat stwcx. differently than a normal cacheable store. Cache-inhibited stwcx. accesses that hit in the private memory space write the appropriate data to the L2 interface and are not forwarded to the system interface. dcbz operations that hit in the private memory space do not affect the data in the external SRAMs. They are handled entirely by the L1. dcbf operations are issued to the L2 interface after being processed by the L1 data cache. If a dcbf that hits in L1 data cache and requires a line push hits in the private memory space, the cache line is written to the L2 interface. dcbf operations that hit in the private memory space are never forwarded to the system interface. dcbst instructions are issued to the L2 cache after being processed by the L1 data cache. If a dcbst that hits in the L1 data cache and requires a line push hits in the private memory space, the cache line is written to the external SRAMs. dcbst operations that hit in the private memory space are never forwarded to the system interface. dcbi instructions that hit in the private memory space are discarded and are never forwarded to the system interface. icbi instructions never affect the L2 interface and are just passed to the system interface for further processing. sync, eieio, eciwx, ecowx, tlbie, and tlbsync instructions pass though the L2 interface and are forwarded to the system interface for further processing.
* *
*
* * *
C.11.6.6 L2 Cache Testing
Several features are provided to facilitate testing of the L2 cache. The original MPC750 User's Manual supplied some incorrect recommended procedures for testing the L2 cache. This section contains a corrected L2 cache test description that applies for both the MPC750 and the MPC755. A typical test for verifying the proper operation of the MPC755 L2 cache memory (external SRAM and tag) performs the following steps: 1. Initialize the test sequence by disabling address translation to invoke the default WIMG setting of 0b0011. 2. Set the L2CR[L2DO] and L2CR[L2TS] bits and perform a global invalidation of the L1 data cache and the L2 cache. The L1 instruction cache can remain enabled to improve execution efficiency.
MOTOROLA Appendix C. MPC755 Embedded G3 Microprocessor C-73
MPC755 L2 Cache Interface Operation (Chapter 9)
After initialization of the test sequence is complete, the L2 cache external SRAM may be tested using the following procedure: 1. Enable the L2 cache and the L1 data cache. Caches should have been invalidated during the initialization step. 2. Execute a series of dcbz, stw, and dcbf instructions to initialize the cache with a sequential range of addresses and with cache data consisting of zeros. 3. Disable the L1 data cache. 4. Initialize the performance monitor counters to zero, and enable counting of L2 hits in the appropriate MMCR register. Refer to Chapter 11, "Performance Monitor," for complete details on using the performance monitors. 5. Perform a series of single-beat load and store operations using a variety of non-zero bit patterns to test for stuck bits and pattern sensitivities in the L2 cache SRAM. These loads and stores should be in the range of addresses used to initialize the caches in step 2 so that each access will hit in the L2 cache. 6. Disable the performance monitor counters, and read the value for the L2 cache hits. Verify that this result matches the accesses performed by the test routine. A complete L2 cache test should test the tag memory as well as the SRAMs. Each bit of tag memory should be tested by loading the cache tags with data consisting of all zeros in one way of the cache and all ones in the other way. Then, a series of accesses should be performed, walking a one or zero through the upper address bits to test for stuck bits and pattern sensitivities in the tag. The number of tag bits used by the cache depends on the size of the cache. On the MPC750 and the MPC755, a 256-Kbyte cache uses 15 tag bits, a 512-Kbyte cache uses 14 tag bits, and a 1-Mbyte cache uses 13 tag bits. For example, to test all the tag bits of a 512-Kbyte cache, a test program needs to do the following: * * Initialize the test sequence by disabling address translation to invoke the default WIMG bit settings of 0b0011. Set the L2CR[L2DO] and L2CR[L2TS] bits and perform an invalidation of the L1 data cache and the L2 cache. The L1 instruction cache may remain enabled for efficiency. Enable the L2 cache and the L1 data cache. Perform a series of dcbz, stw, and dcbf operations to fill the cache with unique data. Fill way 0 of the tag with data consisting of all zeros, and fill way 1 of the tag with data consisting of all ones. The following pseudocode illustrates this procedure:
cache_size = (512 * 1024) cache_line_size = 32 tag_bits = 14 r10 = 0x00000000 // // // // 512 Kbyte 32 byte cache line size for 750 for 512 Kbyte cache all zeros in upper tag_bits bits
* *
C-74
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9) r11 = 0xFFFc0000 r12 = 0 // all ones in upper tag_bits bits // index
for (i = 0; i < (1/2 cache_size / cache_line_size); i++) { dcbz r10,r12 // zero out line in L1 dcache add r13,r10,r12 // create unique data stwx r13,r10,r12 // store unique data in newly // allocated L1 cache entry dcbf r10,r12 // push data to L2 cache WAY 0 dcbz r11,r12 // zero out line in L1 dcache add r13,r11,r12 // create unique data stwx r13,r11,r12 // store unique data in newly // allocated L1 cache entry dcbf r11,r12 // push data to L2 cache WAY 1 r12 += cache_line_size // go to next cache line and repeat }
* *
*
Disable the L1 data cache. Read back the data just written and verify its correctness. Use the performance monitors to count load hits in the L2 to verify that the data came from the L2. The number of hits should equal the number of loads. Attempt a series of loads from the cache with addresses that should not be in the tag by walking a one through the upper tag bits:
r15 = 0x80000000 // address with a 1 in the top bit for (i = 0; i < tag_bits; i++) { initialize/enable the performance monitor counters to count load hits r12 = 0x00000000 // index for (j = 0; j < (1/2 cache_size / cache_line_size); j++) { lwzx r13,r15,r12 // attempt to load data r12 += cache_line_size // go to the next cache line } disable the performance monitors, check to ensure that there were no
hits r15 = r15 >> 1 } // shift the one bit // to the right for the next iteration
*
*
Then perform a similar series of loads, this time by walking a zero through a series of addresses with ones in the upper tag bits. The first iteration of the inner loop above uses the start address 0x7FFC_0000, the second iteration uses the start address 0xBFFC_0000, the third 0xDFF_C000, and so on for each tag bit for the case of a 512-Kbyte cache. If there are any load hits at any point in the loop, there is a faulty tag in the cache. Repeat the entire process, this time with all ones in the way 0 tag entries, and all zeros in the way 1 tag entries. (r10 = 0xFFFC_0000 and r11 = 0x0000_0000 in the pseudocode for the fourth step above for a 512-Kbyte cache.)
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-75
MPC755 L2 Cache Interface Operation (Chapter 9)
Caution: For these L2 cache tests, instruction translation is disabled and the L1 instruction cache is enabled. This means that WIMG defaults to 0b0011. Even though the L2 cache is in data-only mode, if an address in the L2 matches an instruction access, the L2 will hit and provide data for that access. Therefore, cache test programs should avoid loading the L2 with address ranges that match the memory location of the test code. Otherwise, instruction accesses will hit on test data and cause random program behavior. For the test procedure described here, the test program should be located outside the address ranges 0x0000_0000 + cache_size and 0xFFFF_FFFF - cache_size. The entire L2 cache may be tested by clearing L2CR[L2DO] and L2CR[L2TS], restoring the L1 and L2 caches to their normal operational state, and executing a comprehensive test program designed to exercise all the caches. The test program should include operations that cause L2 hit, reload, and castout activity that can be subsequently verified through the performance monitor. Most of the tests described in this section only use the performance monitors to verify the number of cache hits that occurred during the test. While the performance monitors also provide facilities for counting L2 cache misses, this facility is only useful for counting L2 cache misses that cause burst reads to memory to occur. With the L1 data cache disabled and the L2CR[L2TS] bit set, all accesses are single-beat and therefore are not counted by the MPC750's performance monitor as L2 cache misses. The performance monitors can only be used to count misses when the L1 cache is enabled.
C.11.7 L2 Cache SRAM Timing Examples
Chapter 9, "L2 Cache Interface Operation," describes the signal timing for the three types of SRAM (flow-through burst SRAM, pipelined burst SRAM, and late-write SRAM) supported by the MPC750 L2 cache interface. This section provides example timing diagrams for the new PB3 synchronous burst SRAMs supported by the MPC755. The timing diagrams illustrate the best case logical (ideal, not AC-timing accurate) interface operations. For proper interface operation, the designer must select SRAMs that support the signal sequencing illustrated in the timing diagrams. Note that the PB3 SRAMs operate differently from the PB2 SRAMs, and require a different configuration setting in L2CR. PB3 SRAMs provide the efficiencies of the late-write SRAMs, but operate more like traditional PB2 SRAMs (that is, there is no internal write queue). They may be available at speeds comparable to late-write SRAMs, but closer to PB2 prices. They achieve their speed/price benefits by staging the initial internal array access over two clock cycles, thereby requiring an additional wait state for the first read data beat.
C.11.7.1 Pipelined PB3 Burst SRAM
Pipelined burst SRAMs operate at higher frequencies than flow-through burst SRAMs by clocking the read data from the memory array into a buffer before driving the data onto the
C-76 MPC750 RISC Microprocessor Family User's Manual MOTOROLA
MPC755 L2 Cache Interface Operation (Chapter 9)
data bus. This causes initial read accesses by the pipelined burst SRAMs to occur one cycle later than flow-through burst SRAMs, but the L2 bus frequencies supported can be higher. Note that the MPC750 L2 cache interface requires the use of single-cycle deselect pipelined burst SRAMs for proper operation. Some PB3 SRAM devices have strobes with data latches that allow for very late clocking. The MPC755 doesn't support this feature. The MPC755 supports strobeless use of the PB3 devices and all timing (including setup times) must meet the specifications described in the MPC755 Hardware Specifications. Figure C-19 shows a burst read-read-read memory access sequence when the L2 cache interface is configured with PB3 burst SRAMs.
1 SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData burst rd r0 burst rd r1 burst rd r2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
sel0 r0a r0b r0c r0d sel1 r1a r1b r1c r1d sel2 r2a r2b r2c r2d r0a r0b r0c r0d hi-z r1a r1b r1c r1d hi-z r2a r2b r2c r2d
Notes: For PB3, L2ZZ is reused as L2ADS and asserts during the 1st clock only of each L2CE assertion. For PB3, the internal array access requires 1 cycle to row select, 1 cycle for each column select of burst (a-d), and 1 cycle to deselect if write.
Figure C-19. Burst Read-Read-Read L2 Cache Access (Pipelined)
Figure C-20 shows a burst write-write-write memory access sequence when the L2 cache interface is configured with PB3 burst SRAMs.
1 SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData burst wr w0 burst wr w1 burst wr w2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
sel0 w0a w0b w0c w0d dsel sel1 w1a w1b w1c w1d dsel sel2 w2a w2b w2c w2d dsel w0a w0b w0c w0d w1a w1b w1c w1d w2a w2b w2c w2d
Notes: For PB3, L2ZZ is reused as L2ADS and asserts during the 1st clock only of each L2CE assertion. For PB3, the internal array access requires 1 cycle to row select, 1 cycle for each column select of burst (a-d), and 1 cycle to deselect if write.
Figure C-20. Burst Write-Write-Write L2 Cache Access (Pipelined)
MOTOROLA
Appendix C. MPC755 Embedded G3 Microprocessor
C-77
Power and Thermal Management (Chapter 10)
Figure C-21 shows a burst read-write-read memory access sequence when the L2 cache interface is configured with PB3 burst SRAMs.
1 SRAMClk L2CE L2WE SRAMAddress SRAMMemory SRAMData burst rd r0 sel0 r0a r0b r0c r0d burst wr w1 burst rd r2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
sel1 w1a w1b w1c w1d dsel sel2 r2a r2b r2c r2d hi-z r2a r2b r2c r2d
r0a r0b r0c r0d hi-z w1a w1b w1c w1d
Notes: For PB3, L2ZZ is reused as L2ADS and asserts during the 1st clock only of each L2CE assertion. For PB3, the internal array access requires 1 cycle to row select, 1 cycle for each column select of burst (a-d), and 1 cycle to deselect if write.
Figure C-21. Burst Read-Write-Read L2 Cache Access (Pipelined)
C.11.8 Private Memory SRAM Timing
The timing for private memory SRAM is the same as the L2 cache timing described in Section C.11.7, "L2 Cache SRAM Timing Examples."
C.12 Power and Thermal Management (Chapter 10)
The power and thermal management of the MPC755 functions the same as that of the MPC750, and is completely described in Chapter 10, "Power and Thermal Management," except for the restriction on global L2 cache invalidation described in Section C.11.6.2, "L2 Cache Global Invalidation." Additionally, for both the MPC750 and MPC755, no combination of the thermal assist unit, the decrementer register, and the performance monitor can be used at any one time. If exceptions for any two of these functional blocks are enabled together, multiple exceptions caused by any of these three blocks cause unpredictable results.
C.13 Performance Monitor (Chapter 11)
The performance monitor of the MPC755 functions the same as that of the MPC750, and is completely described in Chapter 11, "Performance Monitor," except that for both the MPC750 and MPC755, no combination of the thermal assist unit, the decrementer register, and the performance monitor can be used at any one time. If exceptions for any two of these functional blocks are enabled together, multiple exceptions caused by any of these three blocks cause unpredictable results.
C-78
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Appendix D User's Manual Revision History
for the MPC750 RISC Microprocessor Family
This appendix provides a list of the major differences between Revision 0 and Revision 1 of the MPC750 RISC Microprocessor User's Manual. These corrections also apply to the MPC740, the MPC755, and the MPC745, which are described in MPC750 RISC Microprocessor Family User's Manual. For convenience, the section number and page number of the errata item in the original user's manual are provided. Note that the list only includes the major changes to the user's manual.
Section, Page Change
Throughout the UM Added references to Appendix C, "MPC755 Embedded G3 Microprocessor," and added the appendix. 1.1, 1-3 In Figure 1-1, the 60x BIU is connected to L1 cache and the data path between the 60x BIU and L2 BIU is 64-bit. Also integer unit 1 should have only an add sign, and integer unit 2 should have the add, multiply, and divide signs. Remove the multiply and divide instructions inside the parentheses of the IU2 description, and the sentence should read as follows: "IU2 can execute all integer instructions except multiply and divide instructions (shift, rotate, arithmetic, and logical instructions)." 2.1.1, 2-7 The implementation note for the decrementer register (DEC) should read as follows: "In the MPC750, the decrementer register is decremented and the time base is incremented at a speed that is one-fourth the speed of the bus clock." 2.1.2.2, 2-9 In Figure 2-3, the DBP bit in HID0 register should not be reserved.
1.2.1, 1-4
MOTOROLA
Appendix D. User's Manual Revision History
D-1
D-2
Instruction Unit Fetcher BTIC 64 Entry SRs (Shadow) IBAT Array ITLB BHT CTR LR Instruction MMU Branch Processing Unit 128-Bit (4 Instructions) Instruction Queue (6 Word) Tags 32-Kbyte I Cache Dispatch Unit 64-Bit (2 Instructions) Reservation Station GPR File Rename Buffers (6) Integer Unit 2 +x/ CR 32-Bit System Register Unit 32-Bit 64-Bit Reservation Station Reservation Station (2 Entry) FPR File Rename Buffers (6) 64-Bit Floating-Point Unit +x/ FPSCR Load/Store Unit + (EA Calculation) Store Queue Reservation Station PA Data MMU SRs (Original) DBAT Array DTLB EA 64-Bit 60x Bus Interface Unit Instruction Fetch Queue L1 Castout Queue Tags 32-Kbyte D Cache Data Load Queue 64-Bit L2 Bus Interface Unit L2 Castout Queue L2 Controller L2CR 32-Bit Address Bus 64-Bit Data Bus 17-Bit L2 Address Bus 64-Bit L2 Data Bus L2 Tags Not in the MPC740
Additional Features * Time Base Counter/Decrementer * Clock Multiplier * JTAG/COP Interface * Thermal/Power Manage-
2 Instructions
Reservation Station
Integer Unit 1
+
32-Bit
MPC7450 RISC Microprocessor Family User's Manual
Completion Unit
Reorder Buffer (6 Entry)
MOTOROLA
2.1.2.2, 2-9
1
In Table 2-4, replace the description of HID0[DBP] (bit 1), with the following:
DBP Disable 60x bus address and data parity generation. 0 The system generates address and data parity. 1 Parity generation is disabled and parity signals are driven to 0 during bus operations. When parity generation is disabled, all parity checking should also be disabled and parity signals need not be connected.
Replace the description of HID0[BTIC] (bit 26), with the following:
26 BTIC BTIC enable. Used to enable use of the 64-entry branch instruction cache. 0 The BTIC contents are invalidated and the BTIC behaves as if it were empty. New entries cannot be added until the BTIC is enabled. 1 The BTIC is enabled and new entries can be added.
2.1.2.2, 2-12
In Table 2-4, the description of HID0[IFEM] (bit 23) should read for setting to zero as follows:
23 IFEM Enable M bit on bus for instruction fetches. 0 M bit not reflected on bus for instruction fetches. Instruction fetches are treated as nonglobal on the bus 1 Instruction fetches reflect the M bit from the WIM settings.
2.1.2.4.5, 2-18
Replace Table 2-11 with the following:
Encoding 00 0000 00 0001 00 0010 00 0011 00 0100 00 0101 00 0110 00 0111 00 1000 00 1001 00 1010 00 1011 00 1100 00 1101 00 1110 00 1111 01 0000 All others Register holds current value. Counts processor cycles. Counts completed instructions. Does not include folded branches. Counts transitions from 0 to 1 of TBL bits specified through MMRC0[RTCSELECT]. 00 = 47, 01 = 51, 10 = 55, 11 = 63. Counts instructions dispatched. 0, 1, or 2 instructions per cycle. Counts L1 instruction cache misses. Counts ITLB misses. Counts L2 instruction misses. Counts branches predicted or resolved not taken. Counts MSR[PR] bit toggles. Counts times reserved load operations completed. Counts completed load and store instructions. Counts snoops to the L1 and the L2. Counts L1 cast-outs to the L2. Counts completed system unit instructions. Counts instruction fetch misses in the L1. Counts branches allowing out-of-order execution that resolved correctly. Reserved. Description
MOTOROLA
Appendix D. User's Manual Revision History
D-3
2.1.5, 2-26
In Table 2-18, replace the description of L2CR[L2SL] (bit 16) with the following:
16 L2SL L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM interface is operated below 150 MHz.
2.3.4.3.10, 2-52
Add the following footnote for the stfd instruction in Table 2-39: The MPC750 and MPC755 require that the FPRs be initialized with floating-point values before the stfd instruction is used. Otherwise, a random power-on value for an FPR may cause unpredictable device behavior when the stfd instruction is executed. Note that any floating-point value loaded into the FPRs is acceptable.
2.3.6.3.2, 2-67
Add the following note as a footnote to the mtsr and mtsrin instructions in Table 2-59: The MPC750 and MPC755 have a restriction on the use of the mtsr and mtsrin instructions not described in the Programming Environments Manual.The MPC750 and MPC755 require that an isync instruction be executed after either an mtsr or mtsrin instruction. This isync instruction must occur after the execution of the mtsr or mtsrin and before the data address translation mechanism uses any of the on-chip segment registers.
3.4.2.2, 3-16
Add the following to the end of the section: Both the MPC750 and MPC755 processors require protection in the use of the dcbz instruction in order to guarantee cache coherency in a multiprocessor system. Specifically, the dcbz instruction must be: * Either enveloped by high-level software synchronization protocols (such as semaphores), or * Preceded by execution of a dcbf instruction to the same address. One of these precautions must be taken in order to guarantee that there are no simultaneous cache hits from a dcbz instruction and a snoop to that address. If these two events occur simultaneously, stale data may occur, causing system failures.
4.2, 4-6
The machine check exception in the Table 4-3 should read as follows:
D-4
MPC7450 RISC Microprocessor Family User's Manual
MOTOROLA
Table 4-3. MPC750 Exception Priorities (continued)
Priority Exception Cause Asynchronous Exceptions (Interrupts) 1 Machine check Any enabled machine check condition (L2 data parity error, assertion of TEA or MCP)
4.5.2, 4-14
Remove the reference to parity error in L1 cache from machine check exception conditions. Thus, the first paragraph of this section should read as follows: The MPC750 implements the machine check exception as defined in the PowerPC architecture (OEA). It conditionally initiates a machine check exception after an address or data parity error occurs on the bus or in the L2 cache, after receiving a qualified transfer error acknowledge (TEA) indication on the MPC750 bus, or after the machine check interrupt (MCP) signal had been asserted. As defined in the OEA, the exception is not taken if MSR[ME] is cleared, in which case the processor enters the checkstop state.
4.5.11, 4-20
Remove Table 4-10, "Trace Exception--SRR1 Settings." This interrupt is implemented as defined by the OEA. Remove Table 4-10 and its introductory text. Table 5-4, delete the second row in the table (lwarx or stwcx. with W = 1). In Table 5-5, remove the next-to-last paragraph ("In addition, depending...") from the tlbie description. The next-to-last paragraph should read as follows: The TLB entries are on-chip copies of PTEs in the page tables in memory and are similar in structure. To uniquely identify a TLB entry as the required PTE, the TLB entry also contains four more bits of the page index, EA[10-13] (in addition to the API bits in the PTE).
5.1.7, 5-18 5.1.8, 5-19 5.4.3.1, 5-26
5.4.3.1, 5-27
The second sentence in the second paragraph should read as follows: ITLB miss exception conditions are reported when there are no more instructions to be dispatched or retired (the pipeline is empty).
5.4.3.1, 5-27
The second sentence in the fourth paragraph should read as follows: Thus, TLB entries must be explicitly cleared by the system software (with the tlbie instruction) before address translation is enabled.
5.4.4, 5-29
Figure 5-8 in the original manual incorrectly shows the loopback arrow on the left side pointing to the node above the word `Otherwise'. Replace Figure 5-8 with the following:
MOTOROLA
Appendix D. User's Manual Revision History
D-5
Effective Address Generated (See Figure 5-6) Otherwise Instruction Fetch with N-Bit Set in Segment Descriptor (No-Execute)
Page Address Translation Generate 52-Bit Virtual Address from Segment Descriptor Compare Virtual Address with TLB Entries TLB Hit Case
dcbz Instruction with W or I = 1
Otherwise
Alignment Exception Check Page Memory Protection Violation Conditions (See The Programming Environments Manual)
Access Permitted
Access Prohibited
(See The Programming Environments Manual)
Store Access with PTE [C] = 0
Otherwise
Page Memory Protection Violation
Page Table Search Operation (See Figure 5-9)
PA[0-31]RPN||A[20-31]
Continue Access to Memory Subsystem with WIMG-Bits from PTE
Figure 5-8. Page Address Translation Flow--TLB Hit
6.7, 6-32
The last paragraph should read as follows: "Table 6-6 shows integer instruction latencies. Note that IU1 executes all integer arithmetic instructions--multiply, divide, shift, rotate, arithmetic, and compare. IU2 executes all integer instructions
D-6
MPC7450 RISC Microprocessor Family User's Manual
MOTOROLA
except multiply and divide (that is, shift, rotate, logical, and compare)." 6.7, 6-33 7.2.5.2.1, 7-14 In Table 6-6, remove IU2 from the unit column for instructions mulhwu[.] and mulhw[.]. For ARTRY, change "Timing Comments," "Assertion," to the following: Asserted the second bus cycle following the assertion of TS if a retry is required. 7.2.5.2.1, 7-14 For ARTRY, change the first sentence of the first paragraph of "Timing Comments," "Negation," to the following: Negation/HighZ--Driven until the bus_clk cycle following the assertion of AACK. 7.2.5.2.1, 7-14 For ARTRY, change the last sentence of the first paragraph, "Timing Comments," "Negation," to the following: First the buffer goes to high impedance for a minimum of one-half processor cycle (dependent on the clock mode); then it is driven negated for one-half bus cycle before returning to high impedance. 7.2.9.6.2, 7-23 For SRESET, change "State Meaning," "Asserted," to the following: Does not initialize internal resources (different from HRESET assertion). However, initiates processing for a reset exception as described in Section 4.5.1, "System Reset Exception (0x00100)," (same as HRESET). 8.3.1, 8-12 8.3.2, 8-13 8.3.2.2.2, 8-14 An overbar is missing for TS in the last sentence in the paragraph. In Figure 8-6, the first signal should read as qualBG instead of qualBG. Add the following paragraph to the end of this section: For operations generated by the eciwx/ecowx instructions, a transfer size of 4 bytes is implied, and the TBST and TSIZ[0:2] signals are redefined to specify the resource ID (RID). The RID is copied from bits 28-31 of the external access register (EAR). For these operations, the TBST signal carries the EAR[28] data without inversion (active high). 8.3.2.4, 8-17 9.1.2, 9-5 In Table 8-4, the fifth row in the TSIZ[0-2] column should read as 010 instead of 011. In Table 9-1, replace the description of L2CR[L2DO] (bit 9), with the following:
MOTOROLA
Appendix D. User's Manual Revision History
D-7
9
L2DO
L2 data-only. Setting L2DO inhibits the caching of instructions in the L2 cache. All accesses from the L1 instruction cache are treated as cache-inhibited by the L2 cache (bypass L2 cache, no L2 tag look-up performed).
In Table 9-1, replace the description of L2CR[L2SL] (bit 16) with the following:
16 L2SL L2 DLL slow. Setting L2SL increases the delay of each tap of the DLL delay line. It is intended to increase the delay through the DLL to accommodate slower L2 RAM bus frequencies. Generally, L2SL should be set if the L2 RAM interface is operated at a frequency below the value specified in the MPC750 Hardware Specifications.
9.1.4, 9-7
Add the following to the end of the first paragraph of this section, with the new step shown below inserted at the beginning of the bulleted list: "Note that the MPC750 must be operating at full power (low power modes disabled) in order to perform L2 cache invalidation. The sequence for performing a global invalidation of the L2 cache is as follows: * Clear HID0[DPM] bit to zero. Dynamic power management must be disabled." and then the rest of the bulleted list for the sequence follows.
9.1.7.1, 9-10-11 9.1.7.2, 9-11-12 9.1.7.3, 9-12-14 11.2.1.5, 11-7
In Figure 9-2, Figure 9-3,and Figure 9-4,change L2CE and L2WE signals to L2CE and L2WE. In Figure 9-5, Figure 9-6, and Figure 9-7, change L2CE and L2WE signals to L2CE and L2WE. Figure 9-8, Figure 9-9, and Figure 9-10, change L2CE and L2WE signals to L2CE and L2WE. Replace Table 11-6 with the following (this errata also applies to MPC755):
Table 11-6. PMC2 Events--MMCR0[26-31] Select Encodings
Encoding 00 0000 00 0001 00 0010 00 0011 00 0100 00 0101 00 0110 Register holds current value. Counts processor cycles. Counts completed instructions. Does not include folded branches. Counts transitions from 0 to 1 of TBL bits specified through MMRC0[RTCSELECT]. 00 = 47, 01 = 51, 10 = 55, 11 = 63. Counts instructions dispatched. 0, 1, or 2 instructions per cycle. Counts L1 instruction cache misses. Counts ITLB misses. Description
D-8
MPC7450 RISC Microprocessor Family User's Manual
MOTOROLA
Table 11-6. PMC2 Events--MMCR0[26-31] Select Encodings (continued)
Encoding 00 0111 00 1000 00 1001 00 1010 00 1011 00 1100 00 1101 00 1110 00 1111 01 0000 All others Counts L2 instruction misses. Counts branches predicted or resolved not taken. Counts MSR[PR] bit toggles. Counts times reserved load operations completed. Counts completed load and store instructions. Counts snoops to the L1 and the L2. Counts L1 cast-outs to the L2. Counts completed system unit instructions. Counts instruction fetch misses in the L1. Counts branches allowing out-of-order execution that resolved correctly. Reserved. Description
MOTOROLA
Appendix D. User's Manual Revision History
D-9
D-10
MPC7450 RISC Microprocessor Family User's Manual
MOTOROLA
INDEX
Numerics
60x bus eieio instruction, C-72 L2 cache flushing, C-72 sync instruction, C-72 integer, A-13 ARTRY (address retry) signal, 7-14
B
BG (bus grant) signal, 7-4, 8-7 Block address translation, C-3 block address translation flow, 5-11 definition, 1-12 registers description, 2-6 initialization, 5-18 selection of block address translation, 5-8 Block diagram, C-5 Boundedly undefined, definition, 2-33 BR (bus request) signal, 7-4, 8-7 Branch fall-through, 6-18 Branch folding, 6-18 Branch instructions address calculation, 2-54 condition register logical, 2-55, A-19 description, A-19 list of instructions, 2-55, A-19 system linkage, 2-56, 2-65, A-20 trap, 2-55, A-20 Branch prediction, 6-1, 6-22 Branch processing unit branch instruction timing, 6-23 execution timing, 6-18 latency, branch instructions, 6-31 overview, 1-9 Branch processing unit (BPU) features list, C-6 Branch resolution definition, 6-1 resource requirements, 6-29 BTIC (branch target instruction cache), 6-9 Burst data transfers 64-bit data bus, 8-15 transfers with data delays, timing, 8-32 Bus arbitration, see Data bus Bus configurations, 8-33 Bus interface unit (BIU), 3-2, 3-30 32-bit data bus mode, C-53 address bus pipelining, C-52 aligned data transfer, C-54 burst ordering, C-54 bus clocking, C-53 BVSEL signal, C-51
A
AACK (address acknowledge) signal, 7-14 ABB (address bus busy) signal, 7-5, 8-8 Address bus address tenure, 8-6 address transfer An, 7-7 APE, 8-13 APn, 7-7 address transfer attribute CI, 7-13 GBL, 7-13 TBST, 7-12, 8-14 TSIZn, 7-11, 8-14 TTn, 7-8, 8-13 WT, 7-13 address transfer start TS, 7-6, 8-12 address transfer termination AACK, 7-14 ARTRY, 7-14 terminating address transfer, 8-17 arbitration signals, 7-4, 8-7 bus parking, 8-11 Address bus pipelining, C-52 Address translation, see Memory management unit Addressing modes, 2-35 Aligned data transfer, 8-15, 8-17 Aligned data transfers, C-54 Alignment data transfers, 8-15 exception, 4-18 misaligned accesses, 2-29 rules, 2-29 An (address bus) signals, 7-7 APE (address parity error) signal, 8-13 APn (address parity) signals, 7-7 Arbitration, system bus, 8-9, 8-19 Arithmetic instructions floating-point, A-15
MOTOROLA
Index
Index-1
INDEX
D32 mode, selecting, C-56 features list, C-8 misaligned data transfers, C-55 operation, C-51 signal relationships, C-56 voltages, C-52 Bus transactions and L1 cache, 3-22 BVSEL signal, C-51 Byte ordering, 2-35 address translation, C-28 enabling, C-28 entire cache locking, C-31 invalidating instruction cache (if locked), C-32 prefetching considerations, C-31 preloading instructions, C-29 way locking, C-31 invalidation data cache, C-25 data cache (if locked), C-27 instruction cache (if locked), C-32 loading data cache, C-26 instruction cache preloading, C-29 procedures, C-23 register summary, C-22 terminology, C-21 way locking definition, C-21 cache management instructions, A-20 cache miss, 6-14 cache operations cache block push operations, 9-4 data cache transactions, 3-22 instruction cache block fill, 3-21 load/store operations, processor initiated, 3-10 operations, 3-18 overview, 3-1, 8-2 snoop response to bus transactions, 3-26 cache unit overview, 3-3 cache-inhibited accesses (I bit), 3-6 data cache configuration, 3-3 data cache operation, C-19 dcbf/dcbst execution, 9-4 dcbi/dcbz execution, C-72 differences from MPC750, C-71 features list, C-7 icbi, 9-4 instruction cache configuration, 3-4 instruction cache operation, C-19 instruction cache throttling, 10-10 L1 cache and bus transactions, 3-22 L1 interface cache coherency, C-20 cache-block-push operations, C-72 coherency paradoxes, C-20 coherency precautions, C-20 dcbz instruction, C-16, C-21 icbi instruction, C-72 operation, C-19 L2 dcbi instruction, C-72 stwcx. instruction, C-72, C-72 L2 interface access priorities, C-61
C
Cache bus interface unit, 3-2, 3-30 cache arbitration, 6-11 cache block push operations, C-72 cache block, definition, 3-3 cache characteristics, 3-1 cache coherency description, 3-5 memory/cache access attributes, 3-6 overview, 3-25 reaction to bus operations, 3-26 cache control, 3-13 cache control instructions bus operations, 3-23 cache control, 3-13 dcbi, 2-66 dcbt, 2-63 cache control instructions, effect on L2 cache, C-72 cache hit, 6-11 cache integration, 3-2 cache locking address translation data cache locking, C-24 instruction cache locking, C-28 BAT examples, C-24 data cache locking address translation, C-24 disabling exceptions, C-24 enabling, C-23 entire cache locking, C-26 invalidation, C-25 invalidation (if locked), C-27 loading, C-26 locking, C-23 way locking, C-27 disabling exceptions data cache locking, C-24 instruction cache locking, C-29 enabling data cache, C-23 instruction cache, C-28 entire cache locking definition, C-21 instruction cache locking Index-2
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
INDEX
cache configuration, 9-2 cache control, C-60 cache control instructions, C-73 cache control instructions, effect, C-72 cache global invalidation, 9-7 cache initialization, 9-6 cache testing, 9-9 clock configuration, 9-10 clocking, C-64 coherency, C-62 dcbf instruction when private memory is used, C-73 dcbi, 9-4 dcbi instruction when private memory is used, C-73 dcbst instruction when private memory is used, C-73 dcbz instruction when private memory is used, C-73 disabling the cache, C-70 eciwx instruction when private memory is used, C-73 ecowx instruction when private memory is used, C-73 effect of cache control instructions, C-72 eieio, 9-4, C-72 eieio instruction when private memory is used, C-73 enabling the cache, C-70 features list, C-8 flushing the cache, C-72 global invalidation restriction, C-71 icbi instruction when private memory is used, C-73 L2 cache considerations, 6-15 L2 cache interface signals, 7-26 L2ADDR signal, C-69 L2CR register, C-65 L2DP signal, C-69 L2PM register, C-13, C-68 L2VSEL signal, C-51 operation, 9-2, C-61 organization, C-59 overview, 9-1, C-58 PB2 SRAM, C-62 PB3 SRAMs, C-62 pipelined burst SRAMs, C-76 private memory operation effect of cache control instructions, C-73 overview, C-60, C-62 SRAM timing, C-78 programming considerations, C-70 registers, C-65 services, C-61 single-beat accesses, C-62 MOTOROLA Index SRAM timing examples, 9-10, C-76 stwcx. execution, 9-3, C-72 stwcx. instruction when private memory is used, C-73 sync, 9-4, C-72 sync instruction when private memory is used, C-73 testing, C-73 tlbie instruction when private memory is used, C-73 tlbsync instruction when private memory is used, C-73 WIMG bits, C-62 load/store operations, processor initiated, 3-10 MEI cache coherency protocol, C-3 overview, C-19 PLRU replacement, 3-19 software table search operations (optional), C-3 stwcx. execution, 9-3, C-72 Changed (C) bit maintenance recording, 5-10, 5-21 Changes from the MPC750, C-2 Checkstop signal, 7-23, 8-35 state, 4-16 CI (cache inhibit) signal, 7-13 CKSTP_IN/CKSTP_OUT, 7-23 Classes of instructions, 2-33 Clean block operation, 3-26 CLK_OUT signal, 7-31 Clock signals PLL_CFGn, 7-31 SYSCLK, 7-30 Clocks bus clocking, C-53 L2 clocking, C-64 Compare instructions floating-point, A-16 integer, A-14 Completion completion unit resource requirements, 6-30 considerations, 6-16 definition, 6-1 Completion unit, C-7 Context synchronization, 2-36 Conventions, xxxv, xxxix, 6-1 COP/scan interface, 8-36 Copy-back mode, 6-27 CR (condition register) CR logical instructions, 2-55, A-19 CR, description, 2-4 CTR register, 2-4
Index-3
INDEX
D
DABR (data address breakpoint register), 2-8 DAR (data address register), 2-6 Data block address translation (DBAT) registers, C-12 Data bus arbitration signals, 7-16, 8-8 bus arbitration, 8-19 data tenure, 8-7 data transfer, 7-17, 8-21 data transfer termination, 7-20, 8-22 Data cache block push operation, 3-22 configuration, 3-3 DCFI, DCE, DLOCK bits, 3-13 organization, 3-4 Data organization in memory, 2-29 Data TLB compare (DCMP) register, C-12, C-38 Data TLB miss (DMISS) register, C-12 Data TLB miss address (DMISS) register, C-37 Data TLB miss for load exception, C-33, C-34 Data TLB miss for store exception, C-33, C-35 Data transfers alignment, 8-15 burst ordering, 8-15 eciwx and ecowx instructions, alignment, 8-17 operand conventions, 2-29 signals, 8-21 DBB (data bus busy) signal, 7-17, 8-8, 8-20 DBDIS (data bus disable) signal, 7-19 DBG (data bus grant) signal, 7-16, 8-8 DBWO (data bus write only) signal, 7-16, 8-8, 8-21, 8-37 dcbi, 2-66 dcbt, 2-63 DEC (decrementer register), 2-7 Decrementer exception, 4-19 Defined instruction class, 2-33 DHn/DLn (data bus) signals, 7-18 Dispatch considerations, 6-16 dispatch unit resource requirements, 6-30 DPn (data bus parity) signals, 7-19 DRTRY (data retry) signal, 7-21, 8-22, 8-25 DSI exception, 4-17 DSISR register, 2-7 DTLB organization, 5-23 Dynamic branch prediction, 6-9 loads and stores, 2-36, 2-46, 2-51 eieio, 2-62 EMI protocol, enforcing memory coherency, 8-26 Enveloped high-priority cache block push operation, 3-22 Error termination, 8-26 Event counting, 11-10 Event selection, 11-11 Exceptions alignment exception, 4-18 decrementer exception, 4-19 definitions, 4-12 differences from MPC750, C-32 DSI exception, 4-17 enabling and disabling exceptions, 4-10 exception classes, 4-2 exception handler code, C-44 exception handler flow, C-40 exception prefix (IP) bit, 4-13 exception priorities, 4-4 exception processing, 4-7, 4-10 external interrupt, 4-17 FP assist exception, 4-20 FP unavailable exception, 4-19 instruction TLB miss, C-34 instruction-related exceptions, 2-37 ISI exception, 4-17 machine check exception, 4-14 MPC755-specific data TLB miss for load exception, C-33, C-34 data TLB miss for store exception, C-33, C-35 instruction TLB miss exception, C-33, C-34 performance monitor interrupt, 4-20 program exception, 4-18 register settings MSR, 4-8, 4-12 SRR0/SRR1, 4-7 reset exception, 4-13 returning from an exception handler, 4-11 summary table, 4-3 system call exception, 4-19 system management interrupt, 4-22 terminology, 4-2 thermal management interrupt exception, 4-23 Execution synchronization, 2-37 Execution unit timing examples, 6-18 Execution units, 1-10, C-6 External control instructions, 2-65, 8-17
E
EAR (external access register), 2-8 Effective address calculation address translation, 5-3 branches, 2-36
F
Features, list, 1-4, C-6 Finish cycle, definition, 6-2 Floating-point model FE0/FE1 bits, 4-10 MOTOROLA
Index-4
MPC750 RISC Microprocessor Family User's Manual
INDEX
FP arithmetic instructions, 2-42, A-15 FP assist exceptions, 4-20 FP compare instructions, 2-44, A-16 FP load instructions, A-18 FP move instructions, A-19 FP multiply-add instructions, 2-43, A-16 FP operand, 2-30 FP rounding/conversion instructions, 2-43, A-16 FP store instructions, 2-53, A-19 FP unavailable exception, 4-19 FPSCR instructions, 2-44, A-16 IEEE-754 compatibility, 2-28 NI bit in FPSCR, 2-30 Floating-point unit execution timing, 6-24 latency, FP instructions, 6-34 overview, 1-10, 1-11 Floating-point unit (FPU), C-6 Flush block operation, 3-26 FPRn (floating-point registers), 2-4 FPSCR (floating-point status and control register) FPSCR instructions, 2-44, A-16 FPSCR register description, 2-4 NI bit, 2-30 Functional additions, MPC755 vs. MPC750, C-2 Functional description, MPC755, C-3 Instruction block address translation (IBAT) registers, C-12 Instruction cache configuration, 3-4 instruction cache block fill operations, 3-21 organization, 3-5 Instruction cache throttling, 10-10 Instruction timing examples cache hit, 6-12 cache miss, 6-15 execution unit, 6-18 instruction flow, 6-8 memory performance considerations, 6-27 overview, 6-3 terminology, 6-1 Instruction TLB compare (ICMP) register, C-12, C-38 Instruction TLB miss (IMISS) register, C-12 Instruction TLB miss address (IMISS) register, C-37 Instruction TLB miss exception, C-33, C-34 Instructions branch address calculation, 2-54 branch instructions, 6-8, 6-18, 6-20, A-19 cache control instructions, 9-4 cache management instructions, A-20 classes, 2-33 condition register logical, 2-55, A-19 defined instructions, 2-33 external control instructions, 2-65 floating-point arithmetic, 2-42, A-15 compare, 2-44, A-16 FP load instructions, A-18 FP move instructions, A-19 FP rounding and conversion, 2-43, A-16 FP status and control register, 2-44 FP store instructions, A-19 FPSCR instructions, A-16 multiply-add, 2-43, A-16 illegal instructions, 2-34 instruction cache throttling, 10-10 instruction flow diagram, 6-10 instruction serialization, 6-17 instruction serialization types, 6-17 instruction set summary, 2-31 instruction use, MPC750, C-16 instruction use, MPC755, C-16 instructions not implemented, B-1 integer arithmetic, 2-38, A-13 compare, 2-40, A-14 load, A-17 load/store multiple, A-18 load/store string, A-18 load/store with byte reverse, A-18 Index Index-5
G
GBL (global) signal, 7-13 GPRn (general-purpose registers), 2-4 Guarded memory bit (G bit), 3-6
H
Hardware implementation-dependent register 2 (HID2), C-12, C-15 HIDn (hardware implementation-dependent) registers HID0 description, 2-10 doze bit, 10-3 DPM enable bit, 10-2 nap bit, 10-4 HID1 description, 2-14 PLL configuration, 2-14, 7-31 HRESET (hard reset) signal, 7-24, 8-35
I
IABR (instruction address breakpoint register), 2-9 ICTC (instruction cache throttling control) register, 2-22, 10-11 IEEE 1149.1-compliant interface, 8-36 Illegal instruction class, 2-34
MOTOROLA
INDEX
logical, 2-40, A-14 rotate and shift, 2-41, A-15 store, A-17 integer instructions, 6-32 isync, 4-12 isync instruction restriction, C-16 L2 cache dcbf instruction when private memory is used, C-73 dcbi instruction, C-72 when private memory is used, C-73 dcbst instruction when private memory is used, C-73 dcbz instruction when private memory is used, C-73 eciwx instruction when private memory is used, C-73 ecowx instruction when private memory is used, C-73 eieio instruction when private memory is used, C-73 icbi instruction when private memory is used, C-73 stwcx. instruction hits a modified sector, C-72 hits an unmodified sector, C-72 when private memory is used, C-73 sync instruction when private memory is used, C-73 tlbie instruction when private memory is used, C-73 tlbsync instruction when private memory is used, C-73 latency summary, 6-31 load and store address generation floating-point, 2-51 integer, 2-46 byte reverse instructions, 2-49, A-18 floating-point load, A-18 floating-point move, 2-45, A-19 floating-point store, 2-52 handling misalignment, 2-45 integer load, 2-46, A-17 integer multiple, 2-49 integer store, 2-48, A-17 memory synchronization, 2-59, 2-61, A-18 multiple instructions, A-18 string instructions, 2-50, A-18 lookaside buffer management instructions, A-21 memory control instructions, 2-62, 2-66 memory synchronization instructions, 2-59, 2-61, A-18 MPC750 and MPC755 dcbz instruction, C-16, C-21 MPC750 instruction use, C-16 MPC755 instruction use, C-16 mtsr/mtsrin instruction restriction, C-16 PowerPC instructions set, list, A-1 PowerPC instructions, list, A-7, A-13 processor control instructions, 2-56, 2-60, 2-65, A-20 reserved instructions, 2-35 restrictions, C-16 rfi, 4-11 segment register manipulation instructions, A-21 stfd instruction, C-16, D-4 stwcx., 4-12 support for lwarx/stwcx., 8-36 sync, 4-12 system linkage instructions, 2-56, A-20 TLB management instructions, A-21 tlbie, 2-67 tlbld, C-17, C-18 tlbli, C-17, C-19 tlbsync, 2-67 trap instructions, 2-55, A-20 INT (interrupt) signal, 7-22, 8-34 Integer arithmetic instructions, 2-38, A-13 Integer compare instructions, 2-40, A-14 Integer load instructions, 2-46, A-17 Integer logical instructions, 2-40, A-14 Integer rotate/shift instructions, 2-41, A-15 Integer store gathering, 6-26 Integer store instructions, 2-48, A-17 Integer unit (IU), C-6 Integer unit execution timing, 6-24 Interrupt, external, 4-17 ISI exception, 4-17 isync, 2-62, 4-12 isync instruction restriction, C-16 ITLB organization, 5-23
K
Kill block operation, 3-26
L
L1/L2 interface operation, see Cache L2 cache interface operation, see Cache L2 private memory control (L2PM) register, C-13 L2ADDR (L2 address) signal, C-69 L2ADDRn (L2 address) signals, 7-26 L2CE (L2 chip enable) signals, 7-28 L2CLK_OUTA (L2 clock out A) signal, 7-28 L2CLK_OUTB (L2 clock out B) signal, 7-28 L2CR (L2 cache control register), 2-25, 9-4 L2DATAn (L2 data) signals, 7-27
Index-6
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
INDEX
L2DP (L2 data parity) signal, C-69 L2DPn (L2 data parity) signals, 7-27 L2PM (L2 private memory) control register, C-68 L2SYNC_IN (L2 sync in) signal, 7-29 L2SYNC_OUT (L2 sync out) signal, 7-29 L2VSEL signal, C-51 L2WE (L2 write enable) signal, 7-28 L2ZZ (L2 low-power mode enable) signal, 7-29 Latency load/store instructions, 6-35 Latency, definition, 6-2 Load/store address generation, 2-46 byte reverse instructions, 2-49, A-18 execution timing, 6-25 floating-point load instructions, 2-52, A-18 floating-point move instructions, 2-45, A-19 floating-point store instructions, 2-52, A-19 handling misalignment, 2-45 integer load instructions, 2-46, A-17 integer store instructions, 2-48, A-17 latency, load/store instructions, 6-35 load/store multiple instructions, 2-49, A-18 memory synchronization instructions, A-18 string instructions, 2-50, A-18 Load/store unit (LSU), C-6 Logical address translation, 5-1 Logical instructions, integer, A-14 Lookaside buffer management instructions, A-21 LR (link register), 2-4 lwarx/stwcx. support, 8-36 IMMU, 5-6 exceptions summary, 5-14 features summary, 5-3 implementation-specific features, 5-2 instructions and registers, 5-16 memory protection, 5-9 overview, 1-12, 5-2 page address translation, 5-8, 5-11, 5-26 page history status, 5-10, 5-19-5-22 real addressing mode, 5-11, 5-18 segment model, 5-19 Memory management unit (MMU) DCMP register, C-38 DMISS register, C-37 exception handler code, C-44 exception handler flow, C-40 features list, C-8 HASH1/HASH2 registers, C-38 ICMP register, C-38 IMISS register, C-37 MPC755 features, C-35 software table search operation overview, C-39 registers, C-12, C-37 resources, C-36 support, C-3 tlbld/tlbli instructions, C-17 Memory synchronization instructions, 2-59, 2-61, A-18 Misaligned data transfers, C-55 Misalignment misaligned accesses, 2-29 misaligned data transfer, 8-17 MMCRn (monitor mode control registers), 2-15, 4-20, 11-3 Modes 32-bit data bus mode, C-53 D32 mode, C-56 MPC745 features not supported, C-8 overview, C-1 MPC750 address bus pipelining, C-52 changes in MPC755, C-2 differences from MPC755 exceptions, C-32 programming model, C-10 thermal management, C-78 instruction use, C-16 isync instruction restriction, C-16 mtsr/mtsrin instruction restriction, C-16 pipelined burst SRAMs, C-76 stfd instruction, C-16, D-4 MPC755 32-bit data bus mode, C-53 Index Index-7
M
Machine check exception, 4-14 MCP (machine check interrupt) signal, 7-22 MEI protocol hardware considerations, 3-9 read operations, 3-23 state transitions, 3-31 Memory accesses, 8-4 data transfers, C-54 Memory coherency bit (M bit) cache interactions, 3-6 timing considerations, 6-27 Memory control instructions description, 2-62, 2-66 segment register manipulation, A-21 Memory management unit address translation flow, 5-11 address translation mechanisms, 5-7, 5-11 block address translation, 5-8, 5-11, 5-18 block diagrams 32-bit implementations, 5-5 DMMU, 5-7 MOTOROLA
INDEX
address bus pipelining, C-52 block address translation, C-3 block diagram, C-5 cache locking, C-21 core voltage, C-52 data TLB miss for load exception, C-33, C-34 data TLB miss for store exception, C-33, C-35 exceptions, C-32 features list, C-6 functional description, C-3 I/O signal voltage, C-52 implementation-specific registers, C-12 instruction cache prefetching considerations, C-31 instruction TLB miss exception, C-33, C-34 instruction use, C-16 isync instruction restriction, C-16 L1 cache operation, C-19 memory management unit (MMU), C-35 mtsr/mtsrin instruction restriction, C-16 pipelined burst SRAMs, C-76 programming model, C-10 PTEG registers, HASH1/HASH2, C-13 software table search operation (optional), C-3 stfd instruction, C-16, D-4 MSR (machine state register) bit settings, 4-8 FE0/FE1 bits, 4-10 IP bit, 4-13 PM bit, 2-5 RI bit, 4-11 settings due to exception, 4-12 mtsr/mtsrin instructions restriction, C-16 Multiple-precision shifts, 2-42 Multiply-add instructions, A-16 Multiprocessing support, C-9 instruction cache block fill, 3-21 read operation, 3-23 response to snooped bus transactions, 3-26 single-beat write operations, 8-29 Optional instructions, A-31 Overview, 1-1 MPC745, C-1 MPC755, C-2
P
Page address translation definition, 1-12 page address translation flow, 5-26 page size, 5-19 selection of page address translation, 5-8, 5-13 TLB organization, 5-24 Page history status cases of dcbt and dcbtst misses, 5-20 R and C bit recording, 5-10, 5-19-5-22 Page table entry (PTE) DCMP register, C-12 ICMP register, C-12 Page table entry groups (PTEGs) HASH1/HASH2 registers, C-13 Page table updates, 5-31 Page tables resources for table search operations, C-36 RPA register, C-13 software table search operation, C-39 software table search registers, C-37 SPRG(4-7) registers, C-12 Performance monitor, C-78 event counting, 11-10 event selecting, 11-11 performance monitor interrupt, 4-20, 11-2 performance monitor SPRs, 11-3 purposes, 11-1 registers, 11-3 warnings, 11-12 Phase-locked loop, 10-3 Physical address generation, 5-1 Pipeline instruction timing, definition, 6-2 pipeline stages, 6-7 pipelined execution unit, 6-4 superscalar/pipeline diagram, 6-5 Pipelined burst SRAMs, C-76 PMC1 and PMC2 registers, 1-25 PMCn (performance monitor counter) registers, 2-17, 4-20, 11-6 Power and ground signals, 7-31 Power management doze mode, 10-3 doze, nap, sleep, DPM bits, 2-14 MOTOROLA
N
No-DRTRY mode, 8-33
O
OEA exception mechanism, 4-1 memory management specifications, 5-1 registers, 2-5 Operand conventions, 2-28 Operand placement and performance, 6-25 Operating environment architecture (OEA), xxxii Operations bus operations caused by cache control instructions, 3-23 cache operations, 3-1 data cache block push, 3-22 enveloped high-priority cache block push, 3-22
Index-8
MPC750 RISC Microprocessor Family User's Manual
INDEX
dynamic power management, 10-1 features list, C-9 full-power mode, 10-2 nap mode, 10-3 overview, C-78 programmable power modes, 10-2 sleep mode, 10-4 software considerations, 10-5 Power-on reset (POR) L2PM initialization, C-13 PowerPC architecture instruction list, A-1, A-7, A-13 operating environment architecture (OEA), xxxii user instruction set architecture (UISA), xxxi virtual environment architecture (VEA), xxxi Power-saving modes, C-4 Primary hash address (HASH1) register, C-13, C-38 Priorities, exception, 4-4 Private memory SRAM, C-78 Process switching, 4-12 Processor control instructions, 2-56, 2-60, 2-65, A-20 Program exception, 4-18 Program order, definition, 6-2 Programmable power states doze mode, 10-3 full-power mode with DPM enabled/disabled, 10-2 nap mode, 10-3 sleep mode, 10-4 Programming model, C-10 Protection of memory areas no-execute protection, 5-12 options available, 5-9 protection violations, 5-14 PVR (processor version register), 2-5, C-14 cache locking register summary, C-22 implementation-specific DBAT(4-7), C-12 DCMP, C-12 DMISS, C-12 HASH(1-2), C-13 HID2, C-12, C-15 IBAT(4-7), C-12 ICMP, C-12 ICTC, 2-22, 10-11 IMISS, C-12 L2CR, 2-25, 9-4, C-65 L2PM, C-13, C-68 MMCR0, 2-15, 4-20, 11-3 MMCR1, 2-17, 4-20, 11-5 RPA, C-13 SIA, 2-21, 4-20 SPRG(4-7), C-12 THRMn, 2-22, 10-7 UMMCR0, 2-16 UMMCR1, 2-17 UPMCn, 2-21 USIA, 2-21 MPC750 programming model, 2-3 not implemented MSR, TGPR bit, C-12 performance monitor registers, 2-14 reset settings, 2-27 SPR encodings, 2-58 supervisor-level BAT registers, 2-6 DABR, 2-8 DAR, 2-6 DCMP, C-38 DEC, 2-7 DMISS, C-37 DSISR, 2-7 EAR, 2-8 HASH1/HASH2, C-38 HID0, 2-10, 10-2 HID1, 2-14 HID2, C-12, C-15 IABR, 2-9 ICMP, C-38 ICTC, 2-22, 10-11 IMISS, C-37 L2CR, 2-25, 9-4, C-65 L2PM, C-13, C-68 MMCR0, 2-15, 4-20, 11-3 MMCR1, 2-17, 4-20, 11-5 MSR, 2-5 PMC1 and PMC2, 1-25 PMCn, 2-17, 4-20 PVR, 2-5, C-14 SDR1, 2-6 Index Index-9
Q
QACK (quiescent acknowledge) signal, 7-25 QREQ (quiescent request) signal, 7-25, 8-35 Qualified bus grant, 8-7 Qualified data bus grant, 8-20
R
Read operation, 3-26 Read-atomic operation, 3-26 Read-with-intent-to-modify operation, 3-26 Real address (RA), see Physical address generation Real addressing mode (translation disabled) data accesses, 5-11, 5-18 instruction accesses, 5-11, 5-18 support for real addressing mode, 5-2 Referenced (R) bit maintenance recording, 5-10, 5-20, 5-29 Registers
MOTOROLA
INDEX
SIA, 2-21, 4-20, 11-10 SPRGn, 2-6 SPRs for performance monitor, 11-1 SRn, 2-6 SRR0/SRR1, 2-7 THRMn, 2-22, 10-7 time base (TB), 2-7 user-level CR, 2-4 CTR, 2-4 FPRn, 2-4 FPSCR, 2-4 GPRn, 2-4 LR, 2-4 time base (TB), 2-5, 2-7 UMMCR0, 2-16 UMMCR1, 2-17 UPMCn, 2-21 USIA, 2-21, 11-10 XER, 2-4 Rename buffer, definition, 6-2 Rename buffers, C-7 Rename register operation, 6-17 Required physical address (RPA) register, C-13 Reservation station, definition, 6-2 Reserved instruction class, 2-35 Reset HRESET signal, 7-24, 8-35 reset exception, 4-13 SRESET signal, 7-24, 8-35 Restrictions MPC750 isync instruction, C-16 MPC755 isync instruction, C-16 Retirement, definition, 6-2 rfi, 4-11 Rotate/shift instructions, 2-41, A-15 RSRV (reserve) signal, 7-25, 8-36 32-bit data bus signal relationships, C-56 AACK, 7-14 ABB, 7-5, 8-8 address arbitration, 7-4, 8-7 address transfer, 8-12 address transfer attribute, 8-13 An, 7-7 APn, 7-7 ARTRY, 7-14, 8-22 BG, 7-4, 8-7 BR, 7-4, 8-7 BVSEL, C-51 checkstop, 8-35 CI, 7-13 CKSTP_IN/CKSTP_OUT, 7-23 CLK_OUT, 7-31 configuration, 7-2 COP/scan interface, 8-36 data arbitration, 8-8, 8-19 data transfer termination, 8-22 DBB, 7-17, 8-8, 8-20 DBDIS, 7-19 DBG, 7-16, 8-8 DBWO, 7-16, 8-8, 8-21, 8-37 DHn/DLn, 7-18 DPn, 7-19 DRTRY, 7-21, 8-22, 8-25 GBL, 7-13 HRESET, 7-24 INT, 7-22, 8-34 L2 cache interface signals, 7-26 L2ADDR, C-69 L2ADDRn, 7-26 L2CE, 7-28 L2CLK_OUTA, 7-28 L2CLK_OUTB, 7-28 L2DATAn, 7-27 L2DP, 7-27, C-69 L2SYNC_IN, 7-29 L2SYNC_OUT, 7-29 L2VSEL, C-51 L2WE, 7-28 L2ZZ, 7-29 MCP, 7-22 MPC755-specific signals, C-51 PLL_CFGn, 7-31 power and ground signals, 7-31 QACK, 7-25 QREQ, 7-25, 8-35 reset, 8-35 RSRV, 7-25, 8-36 SMI, 4-23, 7-22 SRESET, 7-24, 8-35 system quiesce control, 8-35
S
SDR1 register, 2-6 Secondary hash address (HASH2) register, C-13, C-38 Segment registers SR description, 2-6 SR manipulation instructions, 2-67, A-21 Segmented memory model, see Memory management unit Serializing instructions, 6-17 Shift/rotate instructions, 2-41, A-15 SIA (sampled instruction address) register, 2-21, 4-20, 11-10 Signals Index-10
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
INDEX
TA, 7-20 TBEN, 7-26 TBST, 7-12, 8-14, 8-21 TEA, 7-21, 8-22, 8-26 TLBISYNC, 7-26 transfer encoding, 7-9 TS, 7-6 TSIZn, 7-11, 8-14 TTn, 7-8, 8-13 WT, 7-13 single, C-3 Single-beat transfer reads with data delays, timing, 8-30 reads, timing, 8-28 termination, 8-22 writes, timing, 8-29 SMI (system management interrupt) signal, 4-23, 7-22 Snooping, 3-25 Software table search optional, C-3 registers, C-12 SPRG(4-7), C-12 tlbld/tlbli instructions, C-17 Special-purpose registers (SPRGn), C-12 Split-bus transaction, 8-8 SPRGn registers, 2-6 SRESET (soft reset) signal, 7-24, 8-35 SRR0/SRR1 (status save/restore registers) description, 2-7 exception processing, 4-7 key bit derivation (SRR1), C-34 Stage, definition, 6-2 Stall, definition, 6-3 Static branch prediction, 6-9, 6-22 stwcx., 4-12 Superscalar, definition, 6-3 sync, 4-12 SYNC operation, 3-26 Synchronization context/execution synchronization, 2-36 execution of rfi, 4-11 memory synchronization instructions, 2-59, 2-61, A-18 SYSCLK (system clock) signal, 7-30 System call exception, 4-19 System interface, see Bus interface unit (BIU) System linkage instructions, 2-56, 2-65 list of instructions, A-20 System management interrupt, 4-22, 10-1 System quiesce control signals (QACK/ QREQ), 8-35 System register unit execution timing, 6-27 latency, CR logical instructions, 6-32 latency, system register instructions, 6-31 MOTOROLA Index System register unit (SRU), C-7
T
TA (transfer acknowledge) signal, 7-20 Table search flow (primary and secondary), 5-29 TBEN (time base enable) signal, 7-26 TBL/TBU (time base lower and upper) registers, 2-5, 2-7 TBST (transfer burst) signal, 7-12, 8-14, 8-21 TEA (transfer error acknowledge) signal, 7-21, 8-26 Termination, 8-17, 8-22 Thermal assist unit (TAU), 10-5 Thermal management differences from MPC750, C-78 features list, C-9 Thermal management interrupt exception, 4-23 THRMn (thermal management) registers, 2-22, 10-7 Throughput, definition, 6-3 Timing considerations, 6-7 Timing diagrams, interface address transfer signals, 8-12 burst transfers with data delays, 8-32 L2 cache SRAM timing, 9-10, C-76 single-beat reads, 8-28 single-beat reads with data delays, 8-30 single-beat writes, 8-29 single-beat writes with data delays, 8-31 use of TEA, 8-32 using DBWO, 8-37 Timing, instruction BPU execution timing, 6-18 branch timing example, 6-23 cache hit, 6-12 cache miss, 6-15 execution unit, 6-18 FPU execution timing, 6-24 instruction dispatch, 6-16 instruction flow, 6-8 instruction scheduling guidelines, 6-28 IU execution timing, 6-24 latency summary, 6-31 load/store unit execution timing, 6-25 overview, 6-3 SRU execution timing, 6-27 stage, definition, 6-2 TLB description, 5-23 invalidate (tlbie instruction), 5-25, 5-31 LRU replacement, 5-24 organization for ITLB and DTLB, 5-23 TLB miss and table search operation, 5-23, 5-27 TLB invalidate description, 5-25 TLB management instructions, 2-68, A-21 Index-11
INDEX
TLB miss, effect, 6-28 tlbie, 2-67 TLBISYNC (TLBI sync) signal, 7-26 tlbld instruction, C-18 tlbli instruction, C-19 tlbsync, 2-67 Transactions, data cache, 3-22 Transfer, 8-12, 8-21 Transfers aligned transfers, C-54 misaligned transfers, C-54 Trap instructions, 2-55 TS (transfer start) signal, 7-6, 8-12 TSIZn (transfer size) signals, 7-11, 8-14 TTn (transfer type) signals, 7-8, 8-13
U
UMMCR0 (user monitor mode control register 0), 2-16, 11-5 UMMCR1 (user monitor mode control register 1), 2-17, 11-6 UPMCn (user performance monitor counter) registers, 2-21, 11-9 Use of TEA, timing, 8-32 User instruction set architecture (UISA) registers, 2-2 User instruction set architecture (UISA) description, xxxi USIA (user sampled instruction address) register, 2-21, 11-10 Using DBWO, timing, 8-37
V
Virtual environment architecture (VEA), xxxi
W
WIMG bits, 8-26 Write-back, definition, 6-3 Write-through mode (W bit) cache interactions, 3-6 Write-with-Atomic operation, 3-26 Write-with-Flush operation, 3-26 Write-with-Kill operation, 3-26 WT (write-through) signal, 7-13
X
XER register, 2-4
Index-12
MPC750 RISC Microprocessor Family User's Manual
MOTOROLA
Overview Programming Model Cache Exceptions Memory Management Unit Instruction Timing Signals System Interface L2 Cache Interface Power Management Performance Monitor Instruction Set Listings Invalid Instructions MPC755 Microprocessor User's Manual Revision History
Index
1 2 3 4 5 6 7 8 9 10 11 A B C D IND
1 2 3 4 5 6 7 8 9 10 11 A B C D IND
Overview Programming Model Cache Exceptions Memory Management Unit Instruction Timing Signals System Interface L2 Cache Interface Power Management Performance Monitor Instruction Set Listings Invalid Instructions MPC755 Microprocessor User's Manual Revision History
Index

▲Up To Search▲

Price & Availability of MPC745

	To Download MPC745 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .